STATISTICS IN PSYCHOLOGY 
AND EDUCATION 


BY 
HENRY E, GARRETT, 221% PhD. 


Ahh. 
-pHIRD EDITION 


4955 


LONGMANS, GREEN AND CO. 
ew yerk} Lendon. Terente 


YOOIOHOY29 UI SOIT2ITAT2 
MOITAOQUGH аға 
^ 


Ya " 
TAY алые ‚ттаяндо A Үтузн 


VOITICH CAINT 


-09 GVA ИННО e КАМОК О 
efrereT .netnel extey wel 


ementary textbook in statis- 
tempted to add new material and to eliminate 


ill have encountered new techniques 
uding in a new text. Furthermore, if he has taught 


hnt and worth incl 

e beginning course in statistics for шалу years, elementary (but 
ental) procedures may, through sheer repetition, 
imple and routine as no longer to be considered 
thy of attention. ‘Undoubtedly either or both of these attitudes. 
work to the disadvantage of the revised book, not to mention the 
eginning student. The addition of extensive new materials may 
asily make 8 book almost unusable to а beginner—especially if the 
ldded material is of an advanced nature and not too well integrated 
th the rest of the text. And the toning down or elimination of 
cessary preliminary methods neglects the fact that each new gen- 
ration of students begins from scratch and that things simple to the 

struetor are not always equally simple to the student. 
) t (fourth) edition of this book I have tried 


In preparing the presen 
avoid the pitfalls of overextension as well as of underemphasis. 
у purpose is the same аз it was in 1926 when the first edition of 


в book was written, namely, to present the fundamentals of statis- 
àl method most useful to students in psychology and education. 
‘accordance with this plan, I have not included highly specialized 
;niques (factor analysis, psychophysical methods, curve fitting), 
i е applicable mainly to test construction, item 
asthe like. It is my experience that specialized as well as 
pics belong in courses designed to follow the elemencary 
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Chapters "dealing with reliability and inferenee have been 
pletely rewritten and several obsolete and marginally useful 
niques dropped out. One new chapter (Chapter 10) dealing 
analysis of variance has been included for those who wish to if 
duce this topic in the first course. For the convenience of t! 
structor the present edition has been divided into three parts. 1 
(Descriptive Statistics) includes Chapters 1-6; Part II (Pred 
and Inference), Chapters 7-11; and Part III (Special Topics), C; 
ters 12-16. More than a hundred and fifty examples with answ 
will be found at the ends of the chapters. А & ) 

Although the present edition is about thirty ‘pages shor, 
than the earlier, I suspect that it still contains Ac o^much mate 
for the usual beginning course. In a short course—one semester 
summer session—I suggest that the instructor concentrate on Part 
as I doubt if he can cover more. If the course extends over a year| 
meets several times a week, I would add to Part I Chapters 7, 8, 
12 and 13. Also, if time permits, I would teach Chaptérs 10, 11, ! 

‘and 15, or assign them as outside work to the better student& Cha 


ter 16 is supplementary to Chapter 15 and is intended to. be u 
mainly for reference. -U 
Many teachers who have used this book in the past have been 
enough to offer suggestions looking to its improvement. То al 
these go my sincere thanks even though I have not been eble 
every case to follow their advice. I am indebted to Dr. Lincoln 
Moses for a critical reading of Chapters 8, 9 and 10. 


Henry E. Garrett 
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|. Measures іп General 


What is meant by measurement 


The measurement of individuals and objects may be of various 
ds, and may be taken to varying degrees of precision. When 
ividuals or things have been ranked or arranged in а series with 
nect to some attribute or trait, we have perhaps the simplest sort 
easurement. Children may be put in order for height, weight, or 
zularity of school attendance; salesmen may be ranked for years 
f experience, ОГ amount of sales over а year; advertisements or pic- 
for cost, or for sales 

in the group but it does 


ures may be ranked for amount of color, ОГ 
3 r tells us serial positión 

surement. We cannot add or subtract ranks as We 

relative to the 


peal. Rank orde 
ot give us а mea; 
an inches or pounds since a person’s rank is always 
inks of other members of his group; and is never absolute, ie. in 


nit. . б 
ividuals may also be expressed ав scores. 
in terms of time taken to complete à task, 
-me:; less often scores are expressed in 
rformed, or excellence of the final 
changes rarely, 
When scores are expressed 
itute a scale. Scaled tests in psychology and 
қ ts or steps put do not possess ап absolute 
o point. On the other hand, the “с.р.5. scales” (centimeters, grams, 
(2 onds) of physics do have equal units and an absolute zero point. 
‘cores? from physical scales are called measures; they may be 
І 


performance, 
changes exactly. 
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_ added or subtracted and a “score” of twenty inches, say, is twite¢ 

1 "score" of ten inches. Scaled scores from mental tests may also й 
added or subtracted just as we add and subtract inches. But w 
cannot say that a score of 40 achieved on a test is twice as good as i 
score of 20, since neither is measured from a zero point of ju 
ability. ‘Traits and other characteristics, measurements of whic 
expressible as scores, are known generally as variables. ` 


2. Continuous and discrete series 


In the measurement of mental and social traits, most of the уаш 
ables with which we deal fall into continuous series, A continuous 
series is one which is capable of any degree of subdivision, although 
in practice divisions smaller than some convenient’ unit are rarel; 
employed. ò Measurements of general intelligence illustrate scores 
which fall into continuous series. 1.0.%, for example, may be thought, 
of as increasing by increments of 1 on an ability continuum which 
extends from the idiot to the genius. But there is no reason why wi 
more refined methods. of measurement we should not be able to 
1.Q.’s of 100.8 or even of 100.83. Physical measures such as height 
weight, and cephalic index as well as scores from mental and educa} 
tional tests fall into continuous series: within the given range any 
measure, integral or fractional, may exist and һауе meaning. Wh i 
gaps occur in a truly continuous series, these аге to be attributed 46 
a failure to measure enough cases, to the relative crudity of thd 
measuring instrument, or to some other factor of a like sort, rathéy 
than to the lack of measures within the gaps. ; { 

Not all variables fall into continuous series. A salary scale in а 
department store may run from $10 per week to $20 per week in uni 
of $1;.no one receives, let us say, $17.53 per week. Again, tlie 
average family in a.certain locality may work out mathematicall 
to have 2.57 children, although there is obviously a real gap between! 
ich exhibit real gaps аге 
called discrete or discontinuous} It is fortunate that-nearly all of th 


In the following sections we shall-define more precisely what i 
meant by a score and shall then show how scores may be classifie | 
into what is саПей а. frequency distribution. | 


ч 
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THE FREQUENCY DISTRIBUTION + 3 
- The meaning of scores in continuous series 


Е Scores or other numbers in continuous series are to be thought of- 
| as distances along w continuum, rather than as discrete points. An 
inch is the linear magnitude between two divisions on a foot-rule; 
and, in like manner, a score in a mental test is a unit distance 


or example, represents the interval 149.5 up to 150.5. The exact 


idpoint of this score-interval is 150 as shown below. 8 


Score 150 L2 
150 | 

149.5 A 150.5 4 
Other scores аге to be interpreted in the same way. A score of 8 on 
the Thorndike Handwriting Scale, for instance, includes. all values 
rom 7.5 up to 8.5; i.e., any value from a point .5 unit below 8, to 
5 unit above 8. Hence, 7.7, 8.0, and 8.4 may all be scored 8. An 
pterval extending from .5 unit below to .5 unit above the given value 
P the usual mathematical meaning of a single score. a 
There is another and somewhat different meaning which a test 
ore may have. According to this second view, a score of 150 means 
at an individual has done at least 150 items correctly, but not 151. 
Тепсе, a score of 150 represents any value between 150 and 151. Any 
ractional value greater than 150, but less than 151, e.g., 150.3 or. 
150.8, since it falls within the interval 150-151 is scored simply as 
150. The middle of the score is 150.5. (See below.) 

Score 159 4 


| 150.5 4541 
150 ^ 151 ё 


tween nine and ten years; will be greater than nine and less than 
ü years (middle value 9.5). But “nine years old” must be taken in 
Папу studies to mean 8.5 up to 9.5 years with a middle value of nine 


e ndr x & " , 2. HE NN 
etween two limits. А scoré of 150 upon an intelligence examination, 


Қ 


Y 


Both of these ways of defining a score аге valid and useful. Which am 


Years, Тһе point to remember is that results obtained from treating m 
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: Scores under our second definition will always be .5 unit higher thar 
] results obtained when Scores are taken under the first or mathe: 
matical definition. The student will often have to decide, perhap: 
~ somewhat arbitrarily, which meaning a score should have. Ав 2 
general rule it is safer to take the first meaning of а score unless 
clearly indicated otherwise. This will be the method followed 
throughout this book. That is, scores of 62 and 231, say, will usually 


` mean 61.5 up to 62.5, and 230.5 up to 231.5, and not 62 up to 63, and 
231 up to 232, | 


4 ll. Drawing Up a Frequency Distribution 


1. The classification of measures 


Data collected from tests and experiments often have little meang 
- ing or significance until they have been rearranged or classified ins 
Systematie way. The first task that confronts us, then, is the organi 
zation of our material and this leads naturally to a grouping ‘of the 
" measures or Scores into classes or categories. Тһе procedure in 
grouping falls under three main heads: ! 
(1) Determination of the range or the interval between the largest: 
and smallest scores, Тһе range is found by subtracting the smallest 
- from the largest score. pom E. 
(2) Decision as to the number and size of the groups to be used in 


classification. The number and size of these class-intervals "T 


depend upon the Tange of scores and the kind of measures with whi 
We are dealing. 


(3) Tabulation of 
27 intervals. 


the separate scores within their proper classe. 


epresent the Army Alpha scores earned by 
he highest score is 197, and the lowest 142, 
ctly-55. In deciding upon the number of 
a good general rule is to select by tria 
ot more than twenty nor less than ten 


* This rule m 
very- ета]. 


ғ 
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| ` (the range) divided by 5 (the interval) gives 11, which is one less Ro 


‚| than thetetualimumber of intervals 
- ORTSA se 3 

units” will: yield nineteen’ classes; 

asses. ` 


, namely, 12. An interval of three 
an interval of ten units, -six 


| 
| 
] 
) 
| 


TABLE | The tabulation of Army Alpha scores made by fifty college 
students 
t 
| 1. The original scores ungrouped 
185 166 176 145% 166 191, 177 164 171 174 
| 147 178 176 # 142 170 158 171 167 180 178 
173 1486 168 187 181 172 165 160 173 184 
| 175 156 158 187 156 172 162 193 173 18 
| ІЗ” 181 151 11 153 172 162 179 188 179 
* Highest score # Lowest score 
2. The same fifty scores grouped into a frequency distribution 
$ г 
(1) (9) ӘӘ ARES. 
‚ Class-Intervals Tallies I(érequency) 
195 up to 200 / 1 
190 * “ 195 // 2 
| 185 * «10% 77 4 
$ 180 * « 185 TL 5 
| 175 “ “150 PL IU 8 
4 170“ « 175 > FEL . 10 
165 “ча 170: HLL 7 6 
160 “ «165 ГА 4 i 
155 “ “ 160 //// Аз 
150 * « 155 / 2 
ono DO жу 1 
te “ 145! » == 
№. ГА үз N = 50 
LÀ + 
"m 
о i à 
‚ The tabulation of the separa 


| К, Shown in Table 1. In the first column of this table the class- 
tervals have been listed serially from the smallest score at the 
кош of the column to the largest ‘score at the top. Шан class- 
mn «Vel comprises exactly five scores. The first interval “140 up to 
Ж begins with ‘score 140 and ends with 144, eths including ie 
Ne apcores 140, 141, 142, 143, and 144. Тһе second interval “145 up ` 
int Boo? begins with 145 and ends with 149, ie, at score 150. The last | 
t ferval “195 up to 200” begins with score 195 and ends at score 200, 
Mig Шефи the: scores 195, 196, 197, 198) 199. In column (2), 
weed “Taies” the separate scores have been listed opposite their | 
Гг“ inte ‘he first score, 185, is represented by a tally placed ` 
"osite infervy “185 up to 190”; the second score, 147, by a Pw 


te scores within their class-intervals + 


6 “ STATISTICS IN PSYCHOLOGY AND EDUCATION f 


placed opposite interval “145 up to 150”; and the third score, oe bY 1 
а tally placed opposite “170 up to 175." Тһе remaining scores hav! 
been tabulated in the same way. When all fifty scores have been 
listed, the total number of tallies on each class-interval (ie., the 
frequency) is written in column (3) headed f (frequency). The P 
of the f column is called N. When the total frequency within eac 
class-interval has been tabulated opposite the proper interval, as 
shown in column (3), our fifty Army Alpha scores are arranged in à 

ncy distribution. 
E М will note that the beginning score of the first interval | 
in the distribution (140 up to 145) has been set at 140 although the | 
lowest score in the series is 142. When the interval selected for tabu: 
lation is five units it facilitates tabulation as well as computations 
which come later if the score limits of the first interval, and, accord- | 
ingly, of each ‘successive interval, are multiples of five. А aoa 
interval “142 up to 147" is just as good theoretically as a class 
interval “140 up to 145"; but the second is easier to handle from 
the апка of the arithmetic involved. „47 

“i 1 


2. Methods of describing the limits of the class-intervals in a frequency 
distribution | 


Table 2 illustrates three ways of expressing the limits of the class- | 
interyals іп а frequency distribution. In (A), the interval “140 up. 
to 145" means, as we have already seen, that all scores from 140 * 
to but not including 145 fall within this grouping. The intervals | 


TABLE 2\Methods of grouping scores into a frequency distribution 


(The di are the fifty Army Alpha scores tabulated in Table 1, p. 5) 


Ao АУА | 
(A), G) - (С) | 
Class- Mid- Class Mid- Class- Mid- 
Intervals joint f Intervals point f Intervals point 1 
195 чр ю 200 197 1 194.5 up to 199.5 197 1 195-199 197 1 
190 ‘ 195 1 2 189.5 194,5” 192 2 190-194 192 2 
185 “ “ 190 18 4 1845 “ “ 189.5 187 4 185-189 187 4 
180 “ “185 182 5 179.5 " '' 1845 189 5 180-184 182 5 
175.2. 180) 177 8 179.5 177 8 175-179 177 8 
170." 47115 172 10- : 174.5 172 10, 170-174 172 10 
105 " “170 167 6. 169.5 167 6 165-160 167 6 
160 “ “ 165 162 4 164.5 102 4 160-164 162 4 
155 " А 160 157 159.5 157 4 155-159 157 4 
150 Ж ie 155 152 154.5 152 2 150-154 152 . 2 
145 PES 150 147 3 149.5 147 3 145-149 147 3 
140 145 149 us 144.5 142 1 140-144 142 21 
N = 50 М = 50 N = 50 
z a 
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| (В) cover the same distances as in (A), but the upper and lower 
limits of each interval are defined more exactly. We have seen 
(p. 5) that a score of 140 in a continuous series ordinarily means the 
interval 139.5 up to 140.5; and that a score of 144 means 143.5 up to 
144.5. Accordingly, to express precisely the fact that an interval 
begins with 140 and ends with 144, we may write 139.5 (the begin- 
ning of seore 140) as the lower limit, and 144.5 (end of score 144 or 
beginning of score 145) as the upper limit of this step. The class- 
intervals in (C) express the same facts more clearly than in (4) and 
less exactly than in (В). Thus, “140-144” means that this interval 
begins with score 140 and ends with score 144; but the precise limits 
of the interval are not given. The diagram below will show how 
(A), (B), and (C) are three ways of expressing identically the 
Same facts: 


Class-Interval 
140 up to 145 
139.5 up to 144.5 
140-144 


Interval Interval 
Begins 1 2 ^ 3 4,4 5 Ends 
189.5 140 141 142 148 144 144.5 


For the rapid tabulation of scores within their proper intervals, 
method (С) is to be preferred to (В) or (A). In (A) it is fairly easy, 
even when one is on guard, to let а score of 160, say, slip into the 
Interval “155 up to 160,” owing simply to the presence of 160 at the 
Upper limit of the interval. Method (В) is clumsy and time-consum- 
ing because of the need for writing .5 at the beginning and end of 
every interval, Method (С), while easiest for tabulation, offers the - 
difficulty that in later calculations one must constantly remember © 
that the expressed class limits are not the actual class limits: that 
Interva] “140-144” begins at 139.5 (not 140) and ends at 144.5 (not 
144). If this is clearly understood, method (C) is as accurate as (B) 
or (A). It will be generally used throughout this book. елін 

‚ The scores grouped within a given interval in a frequency distribu- 
^ tion are assumed to be spread evenly over the entire interval. This 
assumption is made whether the interval is three, five, or ten units. 

We wish to represent all of the scores within a given interval by 

; Some single value, the midpoint of the interval is taken to be the 
logical choice; For example, in the interval 175-179 [Table 2, method 
(С)| ап eight scores upon this interval are represented by the 


at 


| 


single value 177, the midpoint of the interval.* Why 177 is the mid- | 
point of this interval is shown graphically below: 
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4 
| 
Interval 4 


Interval Midpoint / 
Begins 1 2 % 1% 4 5 Ends i 
174.5 175 176 177 178 179 179.5 


A.simple rule for finding the midpoint of an interval is Mid- 
point = lower limit of interval -+ (upper limit — lower limit) | In | 


— 177. Since the interval 


is five units, it follows that the midpoint must be 2.5 units from the 
lower limit of the class, i.e., 174.5 + 2.5; or 2.5 units from the upper 
limit of the class, i.e., 179.5 — 2.5. 
It is often a question whether the midpoint is, in fact, fairly repre- 
vA sentative of all of the scores upon a given interval. Referring to 
Table 1, we find that of the ten Scores in the class-interval “170 up 
to 175" (midpoint 172), three (170, 171, 171) are below the mid- | 
point; three (172, 172, 172) are on the midpoint; and four (173, 173, 
b. 1/9; 174) ате above the midpoint. Of the five scores upon interval 
180 up to 185,” three (180, 181, 181) are below the midpoint (182) ; 
and two (183, 184) are above. The single score of 197 upon interval 
“195 up to 200” falls exactly on the midpoint. In these examples ће. 
midpoint Tepresents quite adequately the scores within the given 
intervals; but it must be admitted that the balancing of scores above 


and below the midpoint is not always so satisfactory as it, is here. 


When the ды да scanty, or when the distribution is badly. "етей 

(p. 97), there may be many more scores on one side of с voint 

«| than on the other. When this happens, the midpoint doe. Ы; 

„егеп ай of the scores within the given interval, "s 

V The assumption that the midpoint is the most representative score 
within an interval holds best when the number of scores in the dis- 
tribution is large, and when the intervals are not too broad, But 
even when neither of these conditions fully obtains, the midpoint 
assumption is not greatly in error and is the best that we can make. ; 
In the Jong Tun, about as many scores will fall above as below the 
various midpoint values; and lack of balance in one interval will 
usually be offset by the opposite condition in another interval. | 


*The same value (namely, 477) ; idpoi i 
when methods (А) and (B) are tert! °F ОЕ the midpoint of the interval 


our illustration, 174:5 -+ 95-1765) 
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reference to OY and OX, the coórdinate axes. The distance of a 
point from О on the X-axis is commonly called the abscissa; and the 
distance of the point from О on the Y-axis the ordinate. The abscissa 
| of point “D” is +9, and the ordinate, —2. 


ы The frequency polygon 


(1) CONSTRUCTION OF THE FREQUENCY POLYGON 
Figure 2 illustrates the use of the coórdinate system in the con- 
struction of a frequency polygon. This graph pictures the frequency 


10 
9 
8 
7 
26 
o 
S 5 
5 
Ж 4 
3 
2 
! Mean=170.8 Median =172 
ean Ps е 
34.5! 144.5 | 1545 1645 174.5 | I845 1945 20 
бөл 149.5 159.5 169.5 1195 1895 199.5 
à \ Ѕсогеѕ 


не. .> “equency polygon plotted from the distribution of fifty Army 
Alpha scores given in Table I, page 5 
ы 


disuribu... n of the 50 Army Alpha scores shown in Table 1, page 5. 
The exact limits of the intervals are laid off at regular distances along 

| the base line (the X-azis) from the origin; and the frequencies within 
_ Sach interval are measured off upon the Y-axis. There is one score on 
» the first interval, 140 up to 145 (Table 1, p. 5). To represent this 
} Score on the diagram, we go out on the X-axis to 142, midway be- 
tween 189.5 and 144.5, and count up one Y-unit. The frequency on 

| the next interval, 145 up to 150, is three, hence the second point falls 
| Midway between 144.5 and 149.5, three units above the X-axis. The 


) 
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two scores on interval 150 up to 155, the four scores on 155 up to 160, 
and the frequency on each succeeding interval, are represented in 
every case by a point the specified number of scores (Y-units) above 
the X-azis, and midway between the upper and lower limits of the 
interval upon which the f lies. It is important in plotting a frequency 
polygon to remember that the midpoint of an interval is always 
taken to represent the entire interval. The height of the ordinate at 
the midpoint represents all of the scores within the given interval. 
When all of the points have been located, they are joined in regu- 
lar order to give the frequency polygon * shown in Figure 2. In 
order to complete the figure, one interval (134.5 to 139.5) at the low 
end, and one interval (199.5 to 204.5) at the high end of the distribu- 
tion have been included on the X-scale. The frequency on each of 
these intervals is zero at the midpoint; hence by including them we 
begin the frequency polygon one-half interval below the first, and 
end it one-half interval above the last, class-interval on the Х-ал18. 
In order to give symmetry and balance to a polygon, one must 
exercise care in the selection of unit-distances to represent the inter- 
vals on the X-axis and the frequencies on the Y-axis. A too-long 
X-unit tends to stretch out the polygon, while a too short X-unit 
crowds the separate points. On the other hand, a too-long Y-unit 
exaggerates the changes from interval to interval, and a too-short 
-unit makes the polygon too flat. A good general rule is to select 
- and Y-units which will make the height of the figure approxi- 
mately 75% of its width. The ratio of height to width may vary 
from 60-80% and the figure still have good proportions; but it can 
rarely go below 50% and leave the figure well balanced. The fre- | 
quency polygon in Figure 2 illustrates the “75% rule." There-are 
thirteen class-intervals laid off on the X-azis—twelve full intervals 
plus one-half interval at the beginning and at the end of the range. 
Hence, our polygon should be 75% of thirteen, or about ten X-axis 
units high. These ten units (each equal to one interval) are laid off 
on the Y-azis. То determine how many scores (f’s) should be as- 
Signed to each unit on the Y-axis, we divide 10, the largest f (on 
interval 169.5 up to 174.5) by 10, the number of intervals laid off 
on Y. The result (1.е., 1) shows that each Y-unit is exactly equal to 
one f or score, as shown in Figure 2. 
ae NE v qus 5 P 18, furnishes another illustration, | 
balance. This pol LA dign polygon ош preserve 
qum gon represents the distribution of 200 cancellation / 
olygon means “many-sided figure.” 
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Scores. Shown in Table 3. There are ten intervals laid off along the 
base line or X-aris—nine full intervals plus one-half interval at the 
beginning and at the end of the range. Since 75% of 10 is 7.5, the 
height of our figure could be either seven or eight X-azis units. To 
determine the “best” value for each Y-unit, we divide 52, the largest 
f (on 119.5 up to 123.5) by 7, getting 7%; and then by 8, getting 6.5. 
Using whole numbers for convenience, evidently we may lay off on 
the Y-axis seven units, each representing eight scores; or eight units 
each representing seven scores. The first combination was chosen 

€cause a unit of eight f's is somewhat easier to handle than one of 
Seven. A slightly longer Y-unit representing ten /в would perhaps 
have been still more convenient. 


TABLE 3 Scores made by 200 adults upon a cancellation test 


Class-Interval = 4 


Class-Intervals Eisen f 
Scores d 

135.5 up to 139.5 137.5 3 d 
181.5 “ “ 135.5 133.5 5 
127.5 “ “131.5 129.5 16 
123.5 “ “ 127.5 125.5 23 
5 119.5 “ “ 123.5 121.5 52 
115.5 “ “ 119.5 117.5 49 
111,5 “ #ҮТБ,Б 113.5 27 
107:5 .'€ “211125 109.5 18 
103.5 “ “ 107.5 105.5 41.7; 
N = 200 


fees total frequency (М) of a distribution is represented by the 
aec 9f its polygon; that is, the area bounded by the irequency sur- 
Fu and the X-azis. The area lying above any given interval, how- 
Т, cannot be taken as proportional to the number of cases within 
el because of the irregularities in the distribution and соп- 
ae ently in the frequency surface. To show the positions of the 
an and the median in the graph, we may locate these measures on 
um X-avis as shown in Figures 2 and 5. Perpendiculars erected at 
ese points show the approximate frequency at the mean and at 
е Median, 
| teps involved in constructing a frequency polygon may be sum- 


Ш i 
| Narized as follows: 


(1) г 
i aa two straight lines perpendicular to each other, the vertical line 
ar the left side of the paper, the horizontal line near the bottom. 
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Label the vertical line (the Y-azis) OY, and the horizontal line (ng 
X-axis) OX. Put the О where the two lines intersect. This point is 
the origin. І 

(2) Lay off the intervals of the frequency distribution at regular distances 
along the X-azis. Begin with the lower limit of the interval next below 
the lowest in the distribution, and end with the upper limit of the 
interval next above the highest in the distribution. Label the successive 
X distances with the interval limits. Select an X-unit which will allow 
all of the intervals to be represented easily on the graph paper. | 

(3) Mark off on the Y-azis successive units to represent the scores (the 
frequencies) on the different intervals. Choose a Y-seale which will 
make the largest frequency (the height) of the polygon approximately 
75% of the width of the figure. 

(4) At the midpoint of each interval on the X-azis go up in the Y direction 
a distance equal to the number of scores on the interval. Place points at 
these locations. 


(5) Join the points plotted in (4) with straight lines to give the frequency 
surface. 


(2) SMOOTHING THE FREQUENCY POLYGON 


А Because the sample is small (N = 50) and the frequency distribu- 
tion somewhat irregular, the polygon in Figure 2 tends to be jagged 
in outline. To iron out chance irregularities, and also get a better 
notion of how the figure might look if the data were more numerous, 
the frequency polygon may be “smoothed” as shown in Figure 3, 
below. In smoothing, a series of “moving” or “running” averages 
are taken from which new or adjusted frequencies are determined. 
- The method is illustrated in Figure 3. To find an adjusted or 


10 
9 
ў 8 
* ч 
i %-ь 
EMT 
2% 
Е 3 
2 
1 

845 


1445 154.5 | 1645 145 18. 
845 149.5 7595 1695 179.5 9399052045 


: Scores ; 
FIG. 3. Original and smoothed frequency polygon. The original and 


smoothed f's are given below 
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to height and width of figure, outlined on page 12 for the frequency 
polygon, should be observed. 

Although in a histogram each interval is represented by a separate 
rectangle, it is not necessary to project the sides of the rectangles 
to the base line as is done in Figure 4, below. The rise or fall of 


10 

9 

8 

| 7 
| 86 
5 5 

8“ 

NE 

2 

| 

0 

ia Mean Median 
Scores 
FIG. 4 Histogram of the fifty Army Alpha scores shown in Table |, 
page 5 


ease or decrease in the number of 
Scores from interval to interval and is usually the important fact to 
© brought out (see Fig. 5). As in a frequency polygon, the total fre- 
Qüeney (М) is represented by the area of the histogram. In contrast 
to the frequency polygon, however, the area of each rectangle in a 
histogram is directly proportional to the number of measures within 
10 interval. For this reason, the histogram presents an accurate ріс- 
“Ше of the relative proportions of the total frequency from interval 
“0 Interval, ‘ 
| x Order to provide a more detailed comparison of the two types of 
"frequency graph, the distribution in Table 3, page 13, is plotted 
Upon the same codrdinate axes in Figure 5, page 18, as a frequency 
^ Polygon and as a histogram. The increased number of cases and the 
| оте Symmetrical arrangement of scores in the distribution make 
| bag ees more regular in appearance than those in Figures 2 
\ 
A | 


the boundary line shows the incr 
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Frequencies 
е в 


- 
a 


œ 


103.5 107.5 1115 115.5 119.5 123.5 127.5 131.5 135.5 139.5 
Scores 


FIG. 5 Frequency polygon and histogram of 200 cancellation scores 
shown in Table 3, page ІЗ 


_ 4. Plotting two frequency distributions on the same axes, when samples 

differ in size 

Table 4 gives the distributions of scores on an achievement exam” 
ination made by two groups, A and B, which differ considerably i? 
size. Group A has 60 cases, Group В, 160 cases. If the two distribu" 
tions in Table 4 are plotted as polygons or as histograms on the same 
Coordinate axes, the fact that the Рв of Group В are so much large! 
than those of Group A makes it hard to compare directly the rang? 


TABLE 4 
(1) (2) (3) (4) (5) 
Achievement Group A Group B G A Group B 
Examination К; $5 Berets а 
Scores Frequencies Frequencies 
80-89 0 9 0.0 5.6 
70-79 3 12 5.0 7.5 
60-69 10 82 16.7 20.0 
50-59 16 48 26.7 80.0 
40-49 12 27 20.0 17.0 
80-39 9 20 15.0 12,5 
20-29 6 12 10.0 7.5 
10-19 24. ТЕП) 6.7 0:0 
60 E 


= 
о 
E 
е? 
E 
8 
g 
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and quality of achievement in the two groups. A useful device in 
cases where the N’s differ in size is to express both distributions in 
Percentage frequencies as shown in Table 4. Both N’s are now 100. 
and the Рв are comparable from interval to interval. For example, 
We know at once that 26.7% of Group A and 30% of Group B made 
Scores of 50 through 59, and that 5% of the A's and 7.5% of the B's 
Scored from 70 to 79. Frequency polygons representing the two dis- 
tributions, in which percentage frequencies instead of original f's 
have been plotted on the same axes, are shown in Figure 6. These 
Polygons provide an immediate comparison of the relative achieve- 
Ment of our two groups not given by polygons plotted from original 
frequencies, 


30 
25 


20 


Percentage Franuencies 


695 795 895 99.5 


0 95 (5 295 395 495 595 
Scores 


F Г 
IG. 5 Frequency polygons of the two disti i i 
are laid off on the X-axis, percentage frequencies on the Y-axis 


1 Percentage frequencies are readily found by dividing each f by N 
o d multiplying by 100. Thus 3/60 Х 100 — 5.0. A simple method of 
n ing Percentage frequencies when а calculating machine is avail- 


x 18 to divide 100 by N and, putting this figure in the machine, to 
tiply each f in turn by it. 
to T example: 1.667 (ie., 100/60) X 3 = 5.0; 1.667 X 10 = 16.7, 


sto, ; А --75 | 3 
a 625 (i.e., 100/160) X 9 = 5.6, 625 X 12 = 7.5, etc. What рег 
) 


ributions in Table 4. Scores 


е Ее frequencies do, in effect, is to scale each distribution down to 
H = 47 
Same total М of 100, thus permitting a comparison of f's for 


Ach interval, 
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5. When to use the frequency polygon and when to use the histogram 


The question of when to use the frequency polygon and when ti 
use the histogram cannot be answered by a general rule which wil 
cover all cases. The frequency polygon is less exact than the histo 
gram in that it does not represent accurately, i.e., in terms of area 
the number of measures within successive intervals. In сотрагіп 
two or more graphs plotted on the same axes, however, the frequenc; 
polygon is the more useful, since the vertical and horizontal lines in 
the two histograms will often coincide. Both the histogram and thé 
frequency polygon tell the same story and both are useful in enabling 
us to show in graphic form whether the scores of a group are dis 
tributed symmetrically or whether they are piled up at the low or at 
the high end of the scale. Not only information with regard to the 
group, but information with regard to the test, may be secured from 
а graph. If a test is too easy, the scores will crowd the high end of 
the scale; if the test is too hard, the scores will pile up at the low end 
of the scale. If the test is well suited to the group, scores will tend to 


is happens, the frequency graph approx” 
or normal frequency curve described in Chapter 5 


IV. Standards of Accuracy in Computation * 


throws away legitimate data. More often 
in too many decimals, a practice which may 
of great precision not always justified by th 


I. Rounded numbers 


In calculation, numbers are usually “rounded” off to the standart) 
of accuracy demanded by the problem. 


If we round off 8.6354 to tw 
* This section should be reviewed fre i i 
problems given in succeeding SERIE M roni denen to an solving M 
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decimals it becomes 8.64; to one decimal, 8.6; to the nearest in- 
teger, 9. Measures of central tendency and variability, coefficients 
| 04 correlation, and other measures, are rarely reported to more than 
two decimal places. A mean of 52.6872, for example, is usually 
Teported as 52.69; a standard deviation of 12.3841 as 12.38; and a 
Coefficient of correlation of .6350 as .63, ete. It is very doubtful 
Whether much of the work in mental measurement warrants accuracy 
yond the second decimal. Convenient rules for rounding numbers 
9 two decimals are as follows: When the third decimal is less than 
RN pn greater than 5, ingreagg me Pe D Д i 
Second ctly 5, compute the fourth decimal апа со ; 
a place; when exactly 5 followed by zeros, drop it and make 
Correction. 


2. Significant figures 


Ya d measurement 64.3 inches is assumed to be correct to the near- 
ала we of an inch, its true value lying somewhere between 64.25 
One ф 4.35 inches. Two places to the left of the decimal point, and 
nifi 9 the right are fixed, and hence 64.3 is said to contain three sig- 
p figures. The numbers 643 and .643 also contain three signifi- 
gures each.. 

an JEN number .003046 there are four significant figures, 3, 0, d 

When the first two zeros serving merely to locate the decimal pm k 
be e used to locate a decimal point only, a zero 18 not considere to 

figur Significant figure; .004, for example, has only one significant 
Ж the two zeros simply fixing the position of 4, the Sena 

signifi The following illustrations should make clear the matter o 

cant figures: 


1% ТЫ has three significant figures. 

U0 has three significant figures also. 
between 136,500 and 135,500. Only 
ше, the zeros serving simply to locate the 

Я Ze of the number. | 
7360. has four significant figures; the decimal indicates that the zero in the 
136 iori place is known—and hence significant. 

43 he three significant figures. 
01536 as four significant figures; the zero 
5 three significant figures; the first 

200135 (есішаі point, 
85 six signifigeant figures; the integer, 2, makes the two zeros to the 
Tight of the decimal point significant. 


The true value of this number lies 
the first three digits are definitely 
decimal point or fix the 


fixes the fourth place. 
two zeros merely locate the 


Tt is necessary in caleulation to make a distinetion between exa 
and approximate numbers. Ап exact number is one which is foun 
by counting: ten children, 150 test scores, twenty desks are exam 
ples. Approximate numbers result from the measurement of variabl 
quantities. Test scores and other measures, for example, are appro: 
imate since they are represented by intervals and not exact points 0 
some scale. Thus a score of 61 may be any value from 60.5 up t 
61.5 and a measured height of 47.5 inches may be any value from 
47.45 up to 47.55 inches (see p. 3). Calculations with exact num 
bers may, in general, be carried to as many decimals as we please: 
Since we may assume as many significant figures as we wish. Бо! 
example, 110 test scores, which means that exactly 110 subjects wer? 
tested, could be written N = 110.000 . . . i.e., to n significant figures 
Caleulations based upon approximate numbers depend upon, and 
are limited by, the number of significant figures in the numbers whicl 
enter into the calculations, This will be made clearer in the follow 
ing rules: 


мм, 
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3. Exact and approximate numbers 
x 


4. Rules for computation 


(1) Accuracy OF А PRODUCT 
(а) Тһе number of significant figures in the product of two of 
more approximate numbers will equal the number of significant fig- 
ures in that one of the numbers which is the least accurate, i.e., which 
contains the smallest number of significant figures. To illustrate: 


125.5 X 7.0 = 880, not 878.5, because 7.0, the less accurate of the two 


numbers, contains only two significant figures. The numb” 


125.5 contains four significant figures. H 
125.5 X 7.000 = 878.5. Both numbers now contain four significant figures; 


hence their product also contains four significant figures. 

(b) When multi 
ber, the number о 
by the number of 
illustrate: 


plying an exact number by an approximate num- 
f significant figures in the product is determined 
significant figures in the approximate number. To 


If each of 12 children (12 is an exact number) has an M.A. of 8 years 
(8 is an approximate number) the product 12 Х8 must be written either 
as 90 or 100, since the approximate number has only one significant digit 
If, however, each М.А. of 8 years can be written as 8.0, the product 
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ао сап be written as 96, since 8.0 contains two significant 
igits. 


(2) accuracy оғ А QUOTIENT 

(a) When dividing one approximate number by another approxi- 
mate number, the significant figures in the quotient will equal the 
Significant figures in that one of the two numbers (dividend or divi- 
Sor) which is less accurate, i.e., which has the smaller number of 
Significant digits, Illustrations: 


v 227 should be written 23, not .22609, since 41 (the less accurate number) 

contains only two significant figures. 

| zE should be written 0034, not .0033869, since 16 (the less accurate 

number) has two significant figures. 

ES er of significant figures in the quotient will x he nu T 
Snificant figures іп the approximate number. Illustr ations: 


ч {һгее 


Should be written .226, since 9.27, the approximate number, has 
Significant figures. The number 41 is an exact number. 
\ SI should be written 170.8, not 170.82 since 8541, the approximate num- 
er, contains only four significant figures. 
А s In dealing with exact numbers, quotients may be written to 
any decimals as one wishes. 


3 
) ассос OF A ROOT OR POWER 
a The S р S: t 
m К quare root of an approximate / 
m ше nificant figures than there are in the number itself. The 
СТ of significant figures retained in а square root is usually less 
* (often one-half ) the number of significant figures in ае ы 
alg, able, \/159.5600 is usually written 12.63, and not 12. ; 
figu, 


Ugh the original number, 159.5600, contains seven significant 
S. 


number can contain no 
than 
m 


e 


(b) 


tains The square, or higher power, of an approximate number A 

fang S many significant figures as there are in the original number 

Че) 49 more), For example, (.034)? = 0012 (two significant fig- 
с) And поё 001156 (four significant figures). md 

К decima] Sots and powers of exact numbers may be taken to as many 


À Places as one wishes. 


АС 
е CURACY ор A SUM OR DIFFERENCE 
number of decimal places to be retained in a sum or difference 
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should be no greater than the number of decimals in the least acc 
rate of the numbers added or substracted. Illustrations: 


362.2 + 18.225 + 5.3062 = 385.7 not 385.7312, since the least accural 
number (362.2) contains only one decimal 
362.2 — 18.245 — 344.0, not 343.955, since the less accurate nu 
ber (362.2) contains only one decimal. 


PROBLEMS 


1. Indicate which of the following variables fall into continuous and whi 
into discrete series: (a) time; (b) salaries in a large business firm; (0 
sizes of elementary school classes; (d) age; (e) census data; (f) distant 
traveled by car; (g) football scores; (A) weight; (i) numbers of pag 
in 100 books; (j) mental ages. 

2. Write the exact upper and lower limits of the following scores in 40 
cordance with the two definitions of a score in continuous series, give! 
on pages 3 and 4: 

62 175 1 
8 312 87 
Í 


3. Suppose that sets of scores have the ranges given below. Indicate ho 
large an interval, and how many intervals, you would suggest for use 0 
drawing up a frequency distribution of each set. 


Range Size of Interval Number of Intervals 


16 to 87. ul 
0 to 46 ^ ) m Е 
110 to 212 * % 
63 to 151 
4 to 12 
4. In each of the following write (a) the exact lower and upper limits A) 
the class-intervals (following the first definition of a score, given o? if ^ 


3), and (b) the midpoint of each interval. ” 
45-47 162.5-167.5 63-67 0-9 . ^ 
1-4 80 up to 90 16-17 25-28 


scores into two frequency dis 


5. (a) Tabulate the following twenty-five ) an interval of fiv 


tributions, using (1) an interval of three, and (2 
units. Let the first interval begin with score 60. 


72 75 TA ) 
81 78 65 86 78 | 
67 82 76 76 70 


83 Are 63 
61 67 84 69 64 


(5) 
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The following 100 scores were made on the Thorndike Intelligence 
Examination for High School Graduates by applicants for admis- 
sion to college. Tabulate these scores into three frequency distribu- 
tions, using class-intervals of three, five, and ten units. Let the first 
interval begin with score 45. 


63 78 76 58 95 
78 86 80, 96 94 
46 78 92 86 88 
82 101 102 70 50 
74 65 73% 72 91 
103 90 87 74 83 
78 75 70 84 98 
86 73 85 99 93 
103 90 79 81 83 
87 86 93 89 76 
73 86 89 71 94 
95 84 90 73 75 
82 86 83 63 56 
89 76 81 105 73 
73 75 85 74 95 
92 83 72 98 110 
85 103 81 78 . 98 
80 86 96 78 “ті 
81 84 81 83 92 
90 85 85 96 72 


6... T 
iie following lists represent the final grades made by two sections of the 
ame course in general psychology. 


(a) 


(b) в 


Tabulate the grades into frequency distributions using an іпіегуа) 
9f 5. Begin with 45 in Section I and 50 in Section ІШ; 

€present these frequency distributions as frequency polygons on 
the same axes. 


Section I (М = 64) Section П (N = 46) 
1 


67 500 51. 170) 90 84 73 78 58 84 
79 81 81 58 76 72 80 74 86 52 74 
(б 640071 72, 57 90 87 92 78 62 
90 76 71 88 66 81 82 76 85 85 90 
71 65 62 65 76 84 79 54 94 81 
80 71 76 54 80 10 97 65 66 77 
63 87 91 90 45 89 69 56 57 
50 47 67 67 52 62 95 65 71 
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7. (a) Plot frequency polygons for the two distributions of 25 scores founi 
in 5(а), using intervals of 3 and of 5 score units. Smooth both di 
tributions (see p. 14) and plot the smoothed f's and the origin 
scores on the same axes. 

(5) Plot a frequency polygon of the 100 scores in 5(b) using an interv 
of 10 score units. Superimpose a histogram upon the frequenc; 
polygon. 

(c) On the same axes, plot a frequency polygon and histogram of the 
100 Thorndike scores using an interval of 5 score units. Smooth the 
frequency polygon and plot on the same diagram. 


8. Reduce the distributions A and B below to percentage frequencies and 
plot them as frequency polygons on the same axes, Is your understand- 
ing of the achievement of these groups advanced by this treatment 
the data? 


Scores Group A Group B 
52-55 1 8 
48-51 0 5 
44-47 5 12 
40-43 10 58 
36-39 20 40 
82-35 12 22 
28-31 8 10 
24-27 2 15 
20-23 3 5 
16-19 4 о 
65 175 


9 (а) Round off the following numbers to two decimals: 


3.5872 74.168 126.83500 
46.9223 25.193 81.72558 3 
(b) How many significant figures in each of the following: 
.00046 91.00 1.03 
46.02 18.365 15.0048 


(c) Write the answers to the following: 
127.4 X .0036 = (both numbers approximate) 
200.0-– 5.63 = “ e “ 

62 X 053 = (first number exact, second approximate) 

364.2 -+ 61.596 = 

364.2 — 61.596 = 
VITS 
(18.6)? 


ll 
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ANSWERS 


2. 615 to 62.5 and 62.0 to 63.0; 174.5 to 175.5 and 175.0 to 176.0; 
75to 8.5 апі 8.010 9.0; 311.5 to 312.5 and 312.0 to 313.0; 


.5 to 1.5 and 1.0 to 2.0 
86.5 to 87.5 and 87.0 to 88.0 


Size of Interval No. of Intervals 
5 15 
8 ог4ог5 16 ог 12 or 10 
10 1 
5 or 10 18 or 9 
1 9 
Midpoint 
44.5to 47.5 46.0 
Біо 45 2.5 
162.5 to 167.5 165.0 
79.5to 89.5 84.5 
62.5to 67.5 65.0 
15.5to 17.5 16.5 
—5to 95 45 
24.5to 28.5 26.5 
3.59 74.17 126.83 
46.92 25.19 81.73 TR 
2 4 3 
4 5 6 
46 
35.5 
3.3 
425.8 
302.6 
6.918 or 6.92 


346 


№ 


MEASURES OF CENTRAL TENDENCY 


+ 


When scores or other measures have been tabulated into 8 
frequency distribution, as shown in Chapter 1, usually the next task 
is to calculate one or more measures of central tendency. The value 
of a measure of centr. d is twof First, it is a single 
measure which represents all of the scores made by the group, and 
se description of the performance of the’ group 
two or more groups 
hree “averages” or 


as such gives a conci 

as a whole; and second, it enables us to compare 

in terms of typical performance. There are t ) 

measures of central tendency іп common use, (1) the arithmetic 

mean, (2) the median, and (8) the mode. Popularly, the average 1s 

E йыга for the arithmetic mean. In statistical work, however, average 

is often used as a general term for any measure of central tendenc: 0 
a e үя 


“I. Calculation of Measures of Central Tendency 


1. The arithmetic mean or "average" (М) 
го 


* (1) CALCULATION OF THE MEAN WHEN DATA ARE UNGROUPED 
The arithemetic mean or simply the mean is the best known meaty 
ure of central tendency. It may be defined as the sum of шае 
scores or other measures divided by their number. To illustrate: if 4 
man earns $3, $4, $3.50, $5, and $4.50 on five successive days his 
mean daily wage ($4.00) is obtained by dividing the sum of his daily 
earnings by the number of days he has worked. The formula for the 
arithmetic mean (M) of a series of ungrouped measures is 
E 422) m=% ; (1) 
ry ^ N 
(arithmetic mean calculated from ungrouped data) 
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in which N is the number of measures in the series, X stands for а 
Score or other measure, and the symbol X means “sum of,” here sum 
_ Of scores, 


(2) CALCULATION оғ THE MEAN FROM DATA GROUPED INTO A FRE- 

, QUENCY DISTRIBUTION 
When measures have been grouped into a frequency distribution, 
` the arithmetic mean is calculated by a slightly different method from 
he one given above. The two illustrations given in Table 5, page 30, 
M vill make the differences clear. The first example shows the calcula- 
tion of the mean of the 50 Army Alpha scores which were tabulated 
Into à frequency distribution in Table 1. First calculate the fX col- 
umn by multiplying the midpoint (X) of each interval by the num- 
ud of scores (f) on it; the mean (170.80) is then simply the sum of 
'e {X (namely, 8450) divided by N (50). The use of the midpoint 
hue of the scores within an interval is made necessary by the fact 
“aft 800гез grouped into intervals lose their identity and must there- 
е represented by the midpoint of that particular interval in 
\ b ich they fall, Hence, we multiply the midpoint of each interval 
Bia? frequency upon that interval; add the fX and divide by № 


9 Obtain the mean. The formula may be written 


X M N 


T E р E 
dS Second example in Table 5 is another illustration of the cal- 


r . 

| фе Sents 200 scores made by a group of adults upon a cancellation 
class Scores have been classified by method (B), page 6, into 9 
foung nt@tVals; and since the intervals are 4 units, the midpoints are 

ою in СУ adding one-half of 4 to the lower limit of each: For exam- 
Ads the first interval, 103.5 4-20 = 105.5. The /Х column totals 
Metig 0; and N equals 200. Hence, applying formula (2), the arith- 

n been is found to be 119.44 (to two decimals). 

by th 9th of the illustrations in Table 5, the М of the scores made 

3 «nn embers of a group was found. We may, however, use either 

4 d а (1) or (2) to calculate the M of a number of measurements 


Чоп of the mean from grouped data. This frequency distribution . 


м” 


E. «UAR 


p" роп the same individual. If an individual's тео io j 


t 2 7 Measured 100 times, and the measures tabulated $ a hec 


ЖУ. 


"m 
"i 
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TABLE 5 The calculation of the mean, median, and crude mode from 
data grouped into a frequency distribution 


1. Data from Table 1, fifty Army Alpha scores 
Class-interval = 5 


Class- 
Intervals Midpoint i fX 
Scores x 
195-199 197 1 197 
190-194 192 2 384 
185-189 187 4 | 748 
180-184 182 5 910 ‘ 
175-179 177 8. 20 1416 
170-174 172 wE T 
165-169 167 - 6 20 1 
160-164 162 ee | 648 
155—159 157 4 628 
150—154 152 2 УУ 
145-149 “ 147. 3 E 
140-144 142 E 142 
М = 50 8540 
N/2 = 25 
7 ZfX 8540 
(1) Mean = ME EE 170.80 


dian = 169.5 + vo X 5 = 172.00 bi 
©) Манан Mie УТАА 170-174 or at 172.00 


2. Scores made by 200 adults upon a cancellation test 
“ Ж Class-interval = 4 


Class-Intervals Midpoint f fX 
Scores X ә е 
135.5 to 139.5 187.5 3 412.5 
181.5 to 135.5 133.5 5 667.5 
.— 127.5 to 131.5 129.5 ; 16 | 2072.0 
23.5 to 127.5 125.5 23 2880.5 
119.5 to 193.5 121.5 52 99 6318.0 
115.5 to 119.5 117.5 49 5751.9 
17501155 Ж 7 1135 27 52 | 3064.5 
107.5 to 111.5 109.5 18 - 1971.0 
103.5 to 107.5 105.5 7 | _ 788.5 
5 М = 200 23888.0 
N/2 = 100 
(1) Mean = 2/Х _ 23,888.0 _ 119.44 


200 
(2) Median = 115.5 + #8 X 4 = 119.42 
(8) Crude Mode falls on class-interval 119.5 to 123.5 or at 121.50 


quency distribution, the M is found in exactly the same way in which 


we compute the “average” reaction time to light of 100 different 
observers. қ 


D 
" MEASURES OF CENTRAL TENDENCY * 3I 
' (8) THE MEAN FROM COMBINED SAMPLES OR GROUPS 
, Suppose that on a certain test the mean for a group of 10 children 
1862, and that on the same test the mean for a group of 40 children is 
62 X 10 + 66 X 40 


50 
or 65.2. The formula for the weighted mean of n groups is 
ММ, 4- М.М» -...-.---- T NM, 
ДАМ соль = (8) 


Ni Na... TN, 
(weighted arithmetical mean obtained from combining 
n groups) 
When only two groups have been combined, the weighted mean is 
Mace, = ММ: PNM: 4 
comb — E. Su No 


66. Then the mean of the two groups combined is 


2. Тһе median (Mdn) * © 


of CALCULATION OF THE MEDIAN WHEN DATA ARE UNGROUPED 
, When ungrouped scores or other measures are arranged in order of 
^ Size, the median is the midpoint in the series. Two situations ач іп 
oq P nutation of the median from ungrouped data: (a) v ү is 
› and (b) when М is even. To consider, first, the case where N is 
» Suppose we have the following integral *mental ages": 7, 10, 8, 
/9, 11, 7, caleulated from seven performance tests. If we arrange 


65€ seven scores in order of size 
7 ri 8 (9) 10 11 12 


the Median is 9.0 since 9.0 is the midpoint of that score which lies | 
) 2 d in the series. Caleulation is as follows: There are ӨЛІ 
| int $$ above, and three below 9, and since a score of 9 covers the 
erval 8.5 to 9.5, its midpoint is 9.0. This is the median. 

OV we drop the first score of 7 our series contains six scores 


9.5 
ТІГЕ гю 1 12 


& ы . H 
d the Median is 9.5. Counting three scores in from the beginning of 


8 ser € 

м Ша, We complete score 9 (which is 8.5 to 9.5) to reach 9.5, the 

A the 7 limit of score 9. In like manner, counting three scores in from 
95 


5, Qd of the series, we move through score 10 (10.5 to 9.5) reaching 
үү Ne lower limit of score 10. > 


6 median is also designated as Md. 


е 
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А formula for finding the median of a series of ungrouped scores 15 
р N+1 
Median = the MDa, measure in order of size (4) 
(median from ungrouped data) 


In our first illustration above, the median is on the DID or 4th | 


score counting in from either end of the series, that is, 9.0 (midpoint | 

5 5 Д NET. 1 
8.5 to 9.5). In our second illustration, the median is on the er» ); 
or 3.5th score in order of size, that is, 9.5 (upper limit of score 9, or 
lower limit of score 10). 


(2) CALCULATION OF THE MEDIAN WHEN DATA ARE GROUPED INTO А 
FREQUENCY DISTRIBUTION 

When scores in a continuous series are grouped into a frequency 
distribution, the median by definition is the 50% point'in the dis- | 
tribution. To locate the median, therefore, we take 50% (i.e, N/2) 
of our scores, and count into the distribution until the 50% point is 4 
reached. The method is illustrated in the two examples in Table 5. 
Since there are 50 scores in the first distribution, N/2 = 25, and the 
median is that point in our distribution of Army Alpha scores which 
has 25 scores on each side of it. Beginning at the small-score end of | 
the distribution, and adding up the scores in order, we find that 
intervals 140-144 to 165-169, inclusive, contain just 20 f's—five 
Scores short of the 25 necessary to locate the median. "Тһе next 

_ interval, 170-174, contains 10 scores assumed to be spread evenly 

over the interval (p. 7). In order to get the five extra scores needed 
to make exactly 25, we take 5/10 5 (the length of the interval) 
and add this increment (2.5) to 169.5, the beginning of the interval 
170-174. This puts the Mdn at 169.5 + 2.5 or at 172.0. The student 
should note carefully that the median like the mean is a point and 
not a score. 

A second illustration of the calculation of the median from data 
grouped into a frequency distribution is given in Table 5 (2). There | 
are 200 scores in this distribution; hence, N/2 — 100, and the median 
must lie at a point 100 scores distant from either end of the distribu- 
tion. If we begin at the small-score end of the distribution (103.5 to б 
107.5) and add the scores in order, 52 scores take us through the 
interval 111.5 to 115.5. The 49 scores on the next interval (115.5 to 
119.5) plus the 52 already counted off total 101—one score too many 
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о &lve us 100, the point at which the median falls. To get the 
Scores needed to make exactly 100 we must take 48/49 4 (the 
length of the interval) and add this amount (3.92) to 115.5, the 
beginning of interval 115.5 to 119.5. This procedure takes us 
Moa. 100 scores into the distribution, and locates the median at 


A formula for calculating the Mdn when the data have been classi- 
ed into a frequency distribution is 


Mdn=1+ 7 i (5) 


(mediam computed from data grouped into a frequency distribution) 
Where 


1 = lower limit of the class-interval upon which the median lies 


N 
7g = one-half the total number of scores 


F = sum of the scores on all intervals below 1 ; 
tn = frequency (number of scores) within the interval upon which 
the median falls 


i = length of the class-interval 


То; 7 
Ts illustrate the use of formula (5), consider the first example in 
“Ше 5, page 30. Here 1 = 169.5, N/2 = 25, Ё = 20, } = 10, and 
<5. H (25 — 20) ~ 5 or at 172.0 
Th ence, the median falls at 169.5 + — ——— x5 ora 0. 
à Ше second example, | = 115.5, N/2 = 100, F = 52, fm = 49, and 4 


t= 
54. 3 100 — 52) 5 
Тһе median, therefore, is 1554 ММ X4 or 119.42. 


E Steps involved in computing the Mdn from data tabulated 
ü Tequency distribution may be summarized as follows: 
0) Find N72, that is, one-half of the cases i 
“gin at the small-score end of the distri { б 
acres in order up to the lower limit (1) of the interval which 
(3) Pains the median. The sum of these scores is F. { 
compute the number of scores necessary to fill out N/2, ie, 
on рше N/2 — F. Divide this quantity by the frequency (fm) 
Те the interval which contains the median; and multiply the 
(4) ШЕ by the size of the class-interval (0). 


nt l the amount obtained by the calculations in (3) to the lower 


п the distribution. . 
bution and count off the 
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limit (l) of the interval which contains the Mdn. This will give 
the median of the distribution. 


The median may also be computed by adding up one-half of the 
scores from the top down in a frequency distribution. The procedure 
is the same through step (3) in the summary above. When we count 
down from the top of the distribution, however, the quantity found 
in step 18) must be subtracted from the upper limit of the interval | 
containing the median. To illustrate with the data of Table 5 (1), | 
page 30, counting down in the f-column, 20 scores complete interval | 
175-179, and we reach 174.5, the upper limit of the interval 170-174. 
Five scores of the 10 on this interval are needed to make 25 (N/2)- 
Hence we have 174.5 — 4 X5= 172.0, which checks our first cal- 
culation of the median. In Table 5 (2), the median found by count- | 
ing down is 119.5 — 25 X 4 or 119.42. 


(3) CALCULATION OF THE Mdn WHEN (a) THE FREQUENCY DISTRIBU- 
TION CONTAINS GAPS; AND WHEN (b) THE FIRST OR LAST INTERVAL 

HAS INDETERMINATE LIMITS | 
(a) Difficulty arises when it becomes necessary to cal¢filate the 
median from a distribution in which there are gaps or zero frequency 
upon one or more intervals. The method to be followed in such cases 
is shown in Table 6 below. Since N = 10, and N/2 = 5, we count up 


TABLE 6 Computation of the median when there are gaps in the dis- 
tribution 
-— 


Class-Intervals 
cores 


SPT 
ME 
коса 
ә 
Бі 
слон ннуроомооонмо “+ 


Мап = 9.5 +8 X 2 = 9.5 


the frequency column five scores through 6-7. Ordinarily, this would 
put the median at 7.5, the lower limit of interval 8-9. If we check 
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this Median, however, by counting down the frequency column five 
Scores, the median falls at 11.5, the lower limit of 12-13. Obviously, 

е diserepaney between these two values of the median is due to the 
two intervals 8-9 and 10-11 (each of which has zero frequency) 
Which lie between 6-7 and 12-13. In order to have the median come 
Out at the same point, whether computed from the top or the bottom 
of the frequency distribution, the procedure usually followed in cases 


Ше this is to have interval 6-7 include 8-9, thus becoming 6-9; and 


* have interval 12-13 include 10-11, becoming 10-13. Lengthening 
0 se intervals from two to four units eliminates the zero frequency 
n the adjacent intervals by spreading the numerical frequency over 
pen. If now we count off five scores, going up the frequeney column 
rough 6-9, the median falls at 9.5, the upper limit of this interval. 
E Counting down the frequency column five scores, we arrive at а 
lan value of 9.5, the upper limit of 6-9, or the lower limit of 
Sister: Computation from the two ends of the series now gives con- 
“nt results—the median is 9.5 in both instances. f 
ue When scores scatter widely, the last class-interval in a fre- 
аз a distribution may be designated as “80 and above” or simply 
inte; +. This means that all scores above 80 are thrown into this 
ing ы the upper limit of which is indeterminate. The same lump- 
tion gether of scores may also occur at the beginning of the йїп 
beloy when the first interval, for example, is designated 20 and 
NS ог 20—. Тһе lower limit of the beginning class-interval is 
is reads terminate. In irregular distributions like these, the median 
ene. ily computed since each score is simply counted as one fre- 
taleus, Whether accurately classified or not. But it is impossible to 
Кук the mean exactly when the midpoint of one or more in- 
Sis unknown, The mean depends upon the absolute size of the 


Scop, u 
i (or their midpoints) and is directly affected by indeterminate 
Туа] limi is 


> The mode 


“Auent, Node is that single measur 


а i “ 
bitig |, Simple ungrouped series of measures the “crude” or “em- 
| е or score which occurs most fre- 


0, 11, 11, 12, 12, 13, 13, 13, 14, 14, 
namely 13, is the crude or em- 
frequency distribution, 


the е or example, in the series 1 
Эса] `e Often recurring measure, 
the er Mode. When data are grouped into a freq i 
Whig 10е Mode is usually taken to be the midpoint of that interval 

Contains the largest frequency. In example 1, Table 5, page 30, 
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the interval 170-174 contains the largest frequency and hence 172.0; 
its midpoint, is the crude mode. In example 2, Table 5, the largest 
frequency falls on 119.5 to 123.5 and the crude mode is at 121.5, the 
midpoint. 

When calculating the mode from a frequency distribution, we dis- 
tinguish between the "true" mode and the crude mode. The true 
mode is the point (or “peak”) of greatest concentration in the dis- 
tribution; that is, the point at which more measures fall than at any | 
other point. When the scale is divided into finely graduated units, 
when scores are recorded exactly, and when N is large, the crude 
mode closely approaches the true mode. Ordinarily, however, the 
erude mode is only approximately equal to the true mode. A formula 
for approximating the true mode, when the frequency distribution 18, 
symmetrical, or at least not badly skewed (page 97) is 


Mode = 3 Mdn — 2 Mean (6) 


(approximation to the true mode calculated from a frequency dis- | 
~ tribution) 


If we äpply this formula to the data in Table 5, the mode is 174.40 0 
for the first distribution, and 119.38 for the second. The first mode 
is somewhat larger and the second slightly smaller than the crude 
modes obtained from the same distributions. 

The crude mode is often an unstable measure of central tendency. 
"This instability is not, however, so serious а drawback as might seem 
at first glance. Тһе crude mode is usually employed as a simple: 
inspectional “average,” to indicate in a rough way the center of 
concentration in the distribution. For this purpose it need not be 
calculated as exactly as the median or mean. 


ІІ. Calculation of the Mean by the "Assumed Mean" 
_ or Short Method 


In Table 5, page 30, the mean was calculated by multiplying the 
midpoint (X) of each interval by the frequency (number of scores) 
on the interval, summing up these values (the /Х column) and divid- | 
ing by N, the number of scores. This straightforward method (called “ 
the Long Method) gives accurate results but often requires the 
handling of large numbers and entails tedious calculation. Because. 
of this, the “Assumed Mean” method, or simply the Short Method; 
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has been devised for computing the mean. The Short Method does 
not apply to the calculation of the median or the mode. These meas- 
ures are always found by the methods previously described. 

The most important fact to remember in calculating the mean by 
the Short Method is that we “guess” or “assume” a mean at the 
outset, and later apply a correction to this assumed value (AM) in 
order to obtain the actual mean (М) (see Table 7, below). There 


TABLE 7 The calculation of the mean by the short method 


(Data from Table 1, 50 Army Alpha scores) 


а UE. (8) @ (5) 
| Class Intervals Midpoint j 2 ўи 
H» 195-199 1 5 5 
70 190-194 19 2 4 8 
185-189 187 4 3 12 
184 1824 5 2 210 
175-179 177 8 1 8 
170-174 j 10 0 +43 
165-169 12 6 #1 A 
160-164 162 4 5-79 - 8 
155-150 157" ави 
150-154 152 2 —4 - 8 
145-149 147 8 -5 — 15 
144 142 5i -6 биб 
М = 50 - 55 
AM = 17200 c= — 8 = — 1240 e. 
ci = — 1.20 ` i=5 


isn 
tiq, Set rule for assuming a mean. 
mcn of an interval somewhere near 
laro АПФ if possible the midpoint of that in 
Argos 


* The best plan is to take the 
the center of the distribu- 
terval which contains the 


Which frequency. In Table 7, the largest f is оп interval 170-174, 
m also happens to be almost in the 'center of the distribution. 
the Се the AM is taken at 172.0, the middle of this interval. When 
Question of the AM is settled, we determine the correction which 

be applied to the AM in order to get М. Steps are as follows: 


First, we fill in the 2’ column,} column (4). Here are entered the 
*viations of the midpoints of the different steps measured from 


Th 
"ean jg касой outlined here gives consistent results no matter where the 
' i, Shtatively placed or e 
me is regul ly placed or assumed. , . Xf th a 
В sjation of a score rom the assume 
кш очасы ie ay rom the actual mean (М) of the 


tribute ; x is the deviation of a score X f 
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the AM in units of class-interval. Thus 177, the midpoint of 
175-179, deviates from 172, the AM, by one interval; and a “18 
is placed in the 27 column opposite 177. In like manner, 182 
deviates two intervals from 172; and a “2” goes in the 2’ column 
opposite 182. Reading on up the 2’ column from 172, we find the 
succeeding entries to be 3, 4, and 5. The last entry, 5, is the | 
interval-deviation of 197 from 172; the actual score-deviation, of 
course, is 25. 

Returning to 172, we find that the 2’ of this midpoint meas- 
ured from the AM (from itself) is zero; hence a zero is placed 
in the 2’ column opposite 170-174. Below 172, all of the т 
entries are negative, since all of the midpoints are less than 172; 
the AM. So the 2’ of 167 from 172 is —1 interval; and the 2^ of 
162 from 172 is —2 intervals. The other 2/8 are —3, —4, —5 
and —6 intervals. 

The z^ column completed, we compute the fz’ column, column 
(5) The fz’ entries are found in exactly the same way as are the 
ÍX in Table 5, page 30. Each 2’ in column (4) is multiplied or 
“weighted” by the appropriate f in column (3). Note again that. ! 
in the Short Method we multiply each a^ by its deviation from. 
the AM in units of class-interval, instead of by its actual devia- 
tion from the mean of the distribution. For this reason, the 
computation of the fz’ column is much more simple than is the, 
calculation of the fX column by the method given on page 29. 
All of the fz’'on intervals above (greater than) the AM are post- 
tive; and all fx’ on intervals below (smaller than) the AM are 
negative, since the signs of the fx’ depend upon the signs of 
the 2’, 


- 


= 


<> 


sum of the positive values іп the fx’ column is 43; and the sun 
of the negative values in the fz’ column is —55. There are, there 
fore, 12 more minus fx’ values than plus (the algebraic sum 18 
-12); and —12 divided by 50 (М) gives —.240 which is the 
correction (c) in units of class-interval. If we multiply ¢ 
(—.240) by i, the length of the interval (here 5), the result is сї 
(—1.20) the score correction, or the correction in score units. 
When —1.20 is added to 172.00, the AM, the result is the actual 
mean, 170.80. 


The process of calculating the mean by the Short Method may be 


summarized 2° follows: i 


From the fx’ column the correction is obtained as follows: The 
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(1) Tabulate the scores or measures into a frequency distribution. 
(2) “Assume” a mean as near the center of the distribution as pos- 


111. 


sible, and preferably on the interval containing the largest 
frequency. | 

Find the deviation of the midpoint of each class-interval from 
the AM in units of interval. 

Multiply or weight each deviation (27) by its appropriate f— 
the f opposite it. 

Find the algebraic sum of the plus and minus fz’ and divide this 
sum by М, the number of cases. This gives с, the correction in 
units of class-interval. 

pully c by the interval length (i) to get сі, the score correc- 
lon, 

Add сі algebraically to the AM to get the actual mean. Some 
times ci will be positive and sometimes negative, depending 
upon where the mean has been assumed. The method works 
equally well in either case. 


When To Use the Various Measures of Central Tendency 


Ns beginning student of statistics is often puzzled te know which 
пие ге of central tendency to use in a given problem. The follow- 
8 will serve as a convenient summary. 


i 


2. 


Use the mean 


(1) When each score or measure should have equal weight in 
determining the central tendency. Since the mean is the sum 
of the scores divided by their number, each score has equal 
Weight in its determination. „№ 

(2) When the measure of central tendency having, the highest 

(а) oe is desired (p. 194). 

Vhen standard deviations and 


product-moment coefficients 
of correlation are to be subsequently, computed (p. 138). 


Use the median 

(1) When a quick and easily computed measure of central 

(2) tendency is wanted. 
When there are extreme 


Mean disproportionately (р. 34). у 
3) When it is desired that certain scores should influence the 


measures which would affect the 


40 - STATISTICS IN PSYCHOLOGY AND EDUCATION 


central tendency but all that is known about them is that 
they are above or below the median (p. 35). 


3. Use the mode 


(1) When the most often recurring or “popular” score is sought. | 
(2) When a quick approximate measure of concentration is all 
that is wanted. 


PROBLEMS 


1. Calculate the mean, median, and mode for the following frequency dis- 
tributions. Use the Short Method in computing the mean. 


(1) Scores f (2) Scores f 
70-71 2 90-94 2 
68-69 2 85-89 2 
66-67 8 80-84 4 
64-65 4 75-79 8 
62-63 6 70-74 6 
60-61 7 65-69 п 
58-50 5 60-64 9:4 
56-57 4 55-59 ( 
54-55 2 50-54 5 
52-53 3 45-49 0 
50-51 `1 40-44 _2 

кы . е x39 N = 56 

(3) Scores | ij (4) Scores f 
120-122 2 100-109 5 
117-119 2 90-99 9 
114-116 2 80-89 14 
111-113 4 70-79 19 
108-110 5 60-69. 21 
105-107 9 50-59 | 30 
102-104 6 40-49 25 
99-101 3 30-39 15 
96-98 4 20-29 10 
93-95 2 10-19 8 
90-92 1 0-9 6 


MEASURES ОҒ CENTRAL TENDENCY * 41 


(5) Scores f (6) f 
120-139 50 15 1 

100-119 150 14 2 

80-99 500 13 3 

60-79 250 12 6 

40-59 50 11 12 

N = 1000 10 15 

9 22 

8. w 3 

fi 18 

6 6 
* 5 2 
4 2 


N 


Я Compute the mean and the median for each of the two distributions in 
Problem 5(a), page 24, tabulated in 3- and 5-unit intervals. Compare 
the two means and the two medians, and explain any discrepancy found. 
(Let the first interval in the first distribution be 61-63; the first interval 


In the second distribution, 60-64.) 


` (a) The same test is given to the three sections of Grade VI. Results are: 
Section I, M = 24, N = 32; Section II, M = 31, М = 54; Section 
IILM- 35, М = 16. What is the general mean for the grade? 
(b) Тһе mean score on AGCT in Camp А is 102, М = 1500; and in 
Camp B 106, N — 450. What is the mean for Camps А and B 
combined? ү Ы 


* (a) Compute the median of the following 16 scores by. the method of 
p. 34. 


ІІ 

f 
xl 

© 


Scores 
20 up to 22 
18 up to 20 
16 up to 18 
14 up to 16 
12up to 14 
10 up to 12 

! 8 up to 10 
6 up to 8 
4up to 6 
2 up to 4 
O0 up to 2- 


A 
SlRoOSCoOROCORONW YS 


= 
1 
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сл 


. (1) Mean — 60.76 (2) Mean = 67.36- 
Median = 60.79 е Median = 66.77 
Mode = 60.85 Mode = 65.59 

(3) Mean =106.00 (4) Mean = 55.43 
Median = 105.83 Median = 55.17 
Mode = 105.49 Mode = 54.65 

(5) Mean = 87.5 1 | i (6) Mean = 859 
Median = 87.5 Median = 8:55 

. Mode = 87.5 Mode = 7.95 
Class-interval = 3 Class-interval = 5 

Mean = 72.92 Mean = 73.00 

Median = 71.75 Median = 72.7) 

(a) 29.43 (5b) 103 (to the nearest whole number) 

(a) Median — 11.5 & 

(с) (1) Median = 31.0 

(2) Median = 72.0 
(3) Median = 9.0 ес: 


(5) In а group of 50 children, the 8 children who took longer than 5 | 
minutes to complete a performance test were marked D.N.C. (did 
not complete). In computing a measure of central tendency for this 
distribution of scores, what measure would you use, and why? 

(c) Find the medians of the following arrays of ungrouped scores bY 
formula (4) p. 32: 

(1) 21, 24, 27, 29, 29, 30, 32, 33,.35, 38, 42, 45. 
(2) 54, 59, 64, 67, 70, 72, 73, 75, 78, 83, 90. 
(3) 7,8, 9,9, 10, 11. 

The time by your watch is 10:31 o’clock. In checking with two friends, 

you find that their watches give the time as 10:25 and 10:34. Assuming 

that the three watches are equally good timepieces, what do you think 
is probably the “correct time”? 

What is meant popularly by the “law of averages"? 

(a) When one uses the term “іп the mode” does he have reference to th? 
mode of a distribution? 

(b) What is approximately the modal time for each of the following 
meals: breakfast, lunch, dinner. Explain your answers. 

(c) Why is the median usually the best measure of the typical contribu 
tion in a church collection? 


ANSWERS 


. Mean is 10:30. 


3 
~ 
; 


MEASURES OF VARIABILITY 


е Ы 


E Chapter 2 the calculation of three measures of central tendency 
Nune typical or representative of a set of scores as a whole— 
A escribed. Ordinarily, the next step is to find some measure of 
an ariability of our scores, i.e., of the “scatter” or “spread” of the 
A Re Scores or measures around their central tendency. It will be 
ae ask of this chapter to show how measures of variability may be 
Puted я 

si The usefulness of a measure of variability сап be seen from a 
m example. Suppose a test of controlled association has been 
can stored to a group of 50 boys and to a group of 50 girls. The 
às th Scores are, boys, 34.6 seconds, and girls, 34.5 seconds. So far 
Brou © means go there is no difference in the performance of the two 
a But suppose the boys’ scores are found to range from 15 to 

in зав and the girls’ scores from 19 to 45 seconds. This difference 
n De Shows that in a general way the boys "cover more territory," 
се variable, than the girls; and this greater variability may be 
ls h ы, Interest than the lack of a difference іп the means. If a group 
abili Ogeneous, that is, made up of individuals of nearly the same 
Y, most of the scores will fall around the same point on the 
e variability will be 
dely differing 
the range will 


Sca] 
Smali е range will be relatively short, and th Г 
ut if the group contains individuals of wi 
ati 5, Scores will be strung out from high to low, 
fig Vely wide, and the variability large. 
two я Situation is represented graphically i 
(50) quency distributions of the same area (№) and same mean 
and КЫ of very different #ariability. Group A ranges from 20 to 80, 
Groy "up B from 40 to 60. Group A is three times as variable as 
us oun} | Pleads over three times the distance on the scale of scores 
gh both distributions have the same central tendency. 


ы 43 


барасы: 
b Paci ie 


n Figure 7, which shows 
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20 30 40 50 60 10 80 


FIG. 7 Two distributions of the same area (М) and mean (50) but of 
very different variability 


Four measures have been devised to indicate the variability or 
dispersion within a set of measures. These are (1) the range, (2) the | 


quartile deviation or Q, (3) the mean deviation or MD, and (4) the | 
standard deviation or SD. { 


1. Calculation of Measures of Variability 


” 


x 
1. The range 


In grouping the scores in Table 1 into a frequency distribution 
(p. 5) we have already had occasion to use the range. It may be 
redefined simply as the interval between the largest and the. smallest 
Scores. In the illustration above, the range of boys' scores Was 51-15 
495: 36 seconds and the range of girls’ scores 45-19 or 26 seconds. The 
nd range is the most general measure of spread or scatter, and is com- 
puted when we wish to make a rough comparison of two or more 
groups for variability. Since the range takes account of the extremes 
of the series only it is unreliable when N is small or when many oF 
large gaps (i.e. zero f's) occur іп the frequency distribution. 


2. The quartile deviation or Q | 


The quartile deviation ог 0) is one-half the distance between the 
75th and 25th percentiles in a frequency distribution. The 25th per- | 
centile or Q; is the first quartile on the score- scale, the point below 
which lie 25% of the scores. The 75th percentile or Qs is the third 
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quartile on the score-scale, the point below which lie 75% of the 
Scores.* 

To find Q, we must first calculate the 75th and 25th percentiles. 
These values are found by exactly the same method employed in cal- 
culating the median. To find Q;, count off 25% of the scores from the 
beginning of the distribution (low end) ; and to find ©з count off 75% 
of the scores from the low end of the distribution, or 25% from the 
high end. 

Table 8 illustrates the calculation of Q for the distribution of fifty 

Ipha scores tabulated in Table 1. First, to find Qı, count off 1/4 
of N (125) from the low-score end of the distribution. When 

€ Scores (f) are added in order, the first four class-intervals (140- 


144 to 155-159, inclusive) are found to contain 10 scores. The next. 


Interval, 160-164, contains four scores, assumed to be spread evenly 
"Ver the interval. Since we need only 2.5 additional scores to make 
Up the necessary 12.5, take 2.5/4 < 5 (the interval) and add this 
тоши, viz., 3.13, to 159.50, the beginning of the interval which соп- 
ans Qi. This calculation locates Q, at 162.63 (see Table 8). 
[Sd 18 found in the same way by counting off 3/4 of N (37.5) from 
3 Small-score end of the distribution. The f’s on 140-144 to 170- 
ДЕ Inclusive, added in order, total 30. The next interval, 175-179, 
Ы, 8 scores. То make up the necessary 37.5, therefore, take 
pus Oe 9, interval) and add this amount (viz. 4.69) to 174.50. This 
9з at 179.19 (see Table 8). 


T. " 4 
ABLE 8 The calculation of the О, MD and SD from data grouped into 
а frequency distribution 


1. Data from Table 1, page 5, 50 Army Alpha scores 


cb. x 
GEAR o @® в) e 779 бы 
А “ЕІшіегүгін Midpoint у б $5 MON. 

1985 26.20 686.44 
Эк ш а c 43 Xe em 
185-189 187 4 16.20 6480 104976 
- 180-181 182 5 1120 5600 69720 
175-179 177 8 6.20 49.60 307.52 
170-174 172 10 30 L 1200 1440 
165-109 7 6 — 880 — 2280 86.64 
100-164 m 4 = 880 — 35.20 30976 
105-159 157 270 1380 — 5520 761.76 
145-154 152 2 11880  — 37.60 70688 
45-149 3 — 23.80 - 7140 1699.32 
140-144 n 1 12880 —2880 82944 
50 502.00 7978.00 


jartile, Qe, is the median. 


'q ju 
Sur Lo 


>Р. 


t N= 
Ma 
aY be noted that the secon 


^ 


46 * STATISTICS ІМ PSYCHOLOGY AND EDUCATION 
TABLE 8—{Continued) 


Mean - 170.80 (Table 5, p. 30) 
N 3N 


т = 12.5 and = 37.5 
2.5 7.5 
Ф-1595---7 X 5 = 162.63 Qs = 174.5 + = X 5,= 17919 
Oe © = Qı _ 179.19 = 162.63 . 8.28 
* 2 |72] 50200 _ 
MD =— ү = p = 1004 


2. Data from Table 3, р. 13, 200 cancellation scores 


(1) (2) (8) (4) (5) (6) 

Class-Intervals Midpoint 

Scores С т WES Ја? 
135.5 to 139.5 137.5 3 18.06 54.18 97840 
131.5 to 135.5 133.5 5 14.06 70.30 988.42 
127.5 to 131.5 129.5 16 10.06 160.96 1619.26 
123.5 to 127.5 125.5 23 6.06 139.38 844.64 
119.5 to 123.5 121.5 52 2.06 107.12 220.67 
115.5 to 119.5. 117.5 ^ * 49. 101 — 1.94 — 95.00 18442 
111.5 to 115.5 113.5 27 = 5.94  — 160.38 952.66 
107.5 to 111.5 109.5 18 25 - 9.94 - 178.02 1778.46 
103.5 іо 107.5: 105.5 AM E — 13.94 - 97.58 1360.27 

M М = 200 A 1063.88 8927.29 
Mean = 119.44 (Table 5) 

NT 3N — 

aF 50 and Brite 150 


Qi = 111.5 +35 X 4 = 11520 Qı —1195 + $$ X 4 = 123.27 


Q 40: > Qı _ 123.27 5 115.20 4.04 
à Í. 
_ XUz[ _ 1063.88 ,., à; ” 
E NCC 2 C j 


"ue [Жул _ [8927.29 3 ^ | 
ge м. EON ш 200 = 9.08 к " 


When Q, and ©з are known, Q, the quartile deviation, is found 
from the formula 


0- Qs — Qi . (7) 
> 2 і 
(quartile deviation calculated from grouped data) 


In the present problem, 0 = 179.19 — 162.63 or 8.28. 
2 


T 


P 


MEASURES OF VARIABILITY • 47 


A Second illustration of the calculation of Q from a frequency 
distribution is given in Table 8, example 2. Since the N of this dis- 
tribution is 200, 1/4 of N equals 50. The intervals 103.5 to 107.5 and 
107.5 to 111,5 contain 25 scores; and the next interval, 111.5 to 115.5, 
Contains 27 scores, which makes a total of 52—two more than the 50. 
Wanted. To find the point reached by just 50 scores, take 25/27 X 4 
(the interval) and add this amount (3.70) to 111.50, the lower limit 
of 111.5 to 115.5. This locates Q; at 115.20. ы 

To find Qs count off 3/4 of N or 150 scores from the small-score 
end of the distribution. The first four intervals include 101 scores, 
and the next interval, 119.5 to 123.5, contains 52 scores. To fill out 
the required 150, take 49/52 X 4, the length of the interval, and add 
їз Merement (3.77) to 119.50, to locate Qs at 123.27. Substituting» | 
5.20 for 0, and 123.27 for Qs in formula (7) we get a 0 of 4.04. ` 
Г Тһе quartiles О, and Оз mark off the limits of the middle 50% of 
E in the distribution and the distance between these points is 
50 ed the interquartile range. 0 is one-half the range of the middle 

2% or the semi-interquartile range. Since Q measures the average 


—— НЕ г 5% 
Istance of the quartile points from the median, it is а good measure 
= ( 


: gore density around the middle of the distribstion.. If the Scores 

SP" distbuion are packed closely Together fhe quarülee will be 

P. to one another and Q willbe small; if the scores are widely : 

15 tered, the quartiles will be relatively’ far apart, and Q will be 
56 (see Fig. 7, p. 44). 

hen the distribution is asymmetrical or “skewed,” Qı and Qs 


ar ; | 
( p" "nequal distances from the median, and the difference between 


Sem Mdn) and (Mdn — Qi) gives a measure of the amount and 


moron of the skewnes (p. 98). When the distribution 15 sym- 
Neal or normal, Q marks off exactly the 25% of cases just above, 


and the Ls 
ES j ү edian. The median then hes 
lust h % of cases just below, the m Mu 


alfway between the two quartiles Qı 


‚Аз р; r 
2 “bution Q becomes the РЕ (probable error). The terms Q and 


Of Ду often used interchangeably, but it is best to restrict the use 
д term PE to the normal probability curve (р. 97). 
PS in calculating Q may be summarized as follows: 


Totg o, 


(1) D 
(2) B 


ivide N by 4. ; 4 
gin at the low-score end of the distribution, and count off the 
| (3) ботев up to the interval which contains Q;. 1 

. "Ivide the number of scores necessary to locate ©, (1.е., to complete 
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N/4) by the frequency in the interval reached іп (2) above, and 
multiply the result by the class-interval. 
(4) Add the amount obtained in (3) to the lower limit of the class 
interval within which Q, lies. This gives Q,. 
To find Q5 


(1) Find 3/4 of N. 

(2) Begin at the low-score * end of the distribution, and count up the 
scores until the interval which contains Оз is reached. 

(3) Divide the number of scores required to locate Q by the frequency 
within the interval reached in (2) and multiply the result by the 
class-interval. 

(4) Add the amount obtained in (3) to the lower limit of the class 
interval within which Оз lies. This gives Qs. 

To find Q 
| 


Substitute Q; апа О, in formula (7). 


3. The Mean Deviation or MD 


The mean deviation or MD (also written average deviation or AD 
and mean variation or MV) is the mean of the deviations of all the 
separate measures in a series taken from their central tendency (usu- 
ally the arithmetic mean; less frequently the median or mode). In 
averaging deviations to find the MD, no account is taken of signs; | 
and all deviations whether positive or negative are treated as 
positive. 

An example will make our definition clearer. If we have five 
scores, 6, 8, 10, 12, and 14, the mean is easily found to be 10. 1813 
then a simple process to find the deviation of each measure from 
this mean by subtracting the mean from each measure. Thus 6, | 
the first score, minus 10 equals —4; 8 — 10 = —2; 10 — 10-0 
12— 10 —2; and 14 — 10 = 4. The five deviations measured fr 

_ the mean are —4, —2,0,2, and 4. If we add these deviations without | 
regard to signs the sum is 12; and dividing 12 by 5 (N), we get 24 ^ 
the mean of the five deviations from their mean, or the MD. Тһе 
formula for the МР when scores are ungrouped may be written 


| e 
Mp-tL (8) 


(mean deviation for ungrouped measures) 


* Qs may also be found by counting in 25% from the high-score end of 152) 
distribution. To avoid confusion, the method given above is recommended t 
the beginner. ar 


(1) a ч MD ғном UNGROUPED DATA 


m 


MEASURES OF VARIABILITY • 49 


$1 
In which the X | x | denotes the sum of the deviations from the mean 
end N is, as before, the number of cases or items. The bars || enclos- 
" Xz indicate that signs are disregarded. The small letter x in the 
ormula always represents the deviation of a score X from its mean 
Mie, 2=X—M. 


(2) cacunation or MD FROM GROUPED DATA 
In Table 8 the calculation of the MD for scores grouped into a fre- 
uency distribution is illustrated by two problems. The mean of the 
ty Army Alpha scores in problem 1 has already been found in 
able 8, page 30, to be 170.80. To compute the MD of the scores in 
18 distribution we must take our deviations (2/8) around this mean. 
eae since the scores have been grouped into class-intervals, we 
n Poe to get the deviation of each separate score from the mean. 
of түзі © separate score deviations, therefore, we take the deviation 
a © midpoint of each interval from the mean. The substitution of 
ence | роци for all of the scores within an interval is the only differ- 
grou etween the computation of z's from grouped and from un- 
^ dra ped data. The a of 195-199, for example, is 26.20, found by sub- 
| vai ng 170.80 (the mean) from 197.00 (the midpoint of the inter- 
А П of the 275 are positive as far down as 170-174, as in each 
\ terva] е midpoint is numerically larger than the mean. шип the in- 
ive 165-169 on down to the beginning of the series, the z's are neg- 
Thus. as the midpoints of these intervals are all smaller than 170.80. 
dM kis т of interval 165-169 is —3.80; and the т of the lowest 
Val in the distribution, 140-144, is —28.80. 
lemen Ш be helpful in calculating deviations from the p n 
Кот that the mean is always subtracted from the individua 


| , 9T midpoint val hat is, т (deviation) = X (score or mid- 
Point) — t yeu Tha When the score or 


шаро; М (mean). The calculation is algebraic. n the s ә 
Bent is numerically larger than the mean the deviation is posi- 
| Шу smaller than the 


the score or midpoint is numerica 
Non viation is negative. i aN NU e 
| inten 0 (4) Table 8, page 45, gives the deviation of each class 
tribu, 88 represented by its midpoint, 1r 
dence 00. There are more scores on some 1n 
| " Mine midpoint deviation in column 
h aves the ШОШ number of nd n 
TE 2 column mn (5). The fir 
of B Оле score on Hey 2 multiply the first 2 by 1. The next 
othe ; 0, since each of the two scores on 190-194 has an = of 21.20. 
б Same way we obtain the other fa’s by multiplying, їп each | 


fa is 26.20; for, since there 


q 
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case, the 2 in column (4) by its corresponding f in column (3). When 
all of the fz's have been calculated, the column is added without 
regard to sign, and the resulting sum is divided by N to give the MD. 
In the present problem the MD equals 502.00/50 or 10.04. 

The formula for the MD when measures are grouped into a fre- | 
quency distribution is as follows: | 


= =I fr] 
MD = Se! 9 
D (9) 


. (mean deviation for scores grouped into а frequency distribution) | 


The second problem іп Table 8 shows the calculation of the MD | 
for 200 cancellation scores grouped into а frequency distribution in 
class-intervals of four. The mean of this distribution was found to be 
119.44 (Table 5, page 30). Hence, the т of the topmost interval, 
135.5 to 139,5 (midpoint 137.50), from the mean is 18.06. Since the | 
class-interval is constant in size, the next x may be found by sub- 
tracting 4 (the interval) from 18.06; and each succeeding т may be 
found by subtracting 4 from the = just preceding it. | 

The fz's in column (5) are found, as shown in problem 1, by - 
weighting each z by the f which it represents—by the f opposite it. 
Тһе sum of the fz column is 1063.88; and, since М is equal to 200, 
from formula (9) we obtain 5.32 as the M.D of the scores in this dis- 
tribution around their mean of 119.44. 

In а symmetrical or normal distribution the MD, when measured 
off on the scale above and below the mean, marks the limits of the 
middle 57.5% of the measures. The MD is always slightly larger: 
therefore, than the Q which marks off the limits of the middle 50%- 
A large MD means that the scores of the distribution tend to scatter 
widely around the central tendency; a small MD that they tend to be # 
concentrated within a relatively narrow range. 


4. The standard deviation or SD Y 


Тһе standard deviation or SD is the measure of variability cus- 
tomarily employed in research. The SD differs from the MD in 
several respects. In calculating the MD we disregard signs and treat 
all deviations as positive; in finding the SD we avoid this difficulty 
of signs by squaring the separate deviations. Again, the squared 

` deviations used in computing the SD are always taken from the mea? | 
of the distribution, and never from the median or mode. The con- 
ventional symbol used to denote the SD is the Greek letter sigma (0). | 
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(1) CALCULATION OF SI) FROM UNGROUPED DATA 
қ Тһе standard deviation or c is the square root of the mean of the 
on. 0 illustra е 
8 culation о 
егіев, let us 


ealeulation of the MD, in which the deviations of the five measures, 
›8, 10, 12, and 14 from their mean of 10 were found to be —4, —2, 
wand 4, respectively. Squaring each of these deviations, we 


obtain 16, 4, 0, 4, and 16. Summing these five squares and dividing : 


Y five 


Toot › We obtain the mean of the squares, and, extracting the square 


» Bet 2.83, the SD of this series. The formula for the SD or c 
en the series of scores is ungrouped is as follows: 


(standard deviation calculated from ungrouped data) 


a CALCULATION ор SD FROM GROUPED DATA 
{Ды 98 illustrates the calculation of о when ‘Scores are grouped 
Use ч frequency distribution. The process is identical with that 
К or Ungrouped items, except that, in addition to squaring the 2 
m Midpoint from the mean, we weight each of these squared 
leno ‘ons by the freqency which it represents—that E by the fre- 
simp. Opposite it. This multiplication gives the јх column. By 
Obtai algebra, x X f; = fa?; and accordingly the easiest way to 
Anq 16 entries in column fz? is to multiply the corresponding vs 
68644 , 2 columns (4) and (5). The first fx? entry, for example, is 
the ро the product of 26.20 times 26.20; the second entry is 898.88, 
All or Oduet of 42.40 times 21.20; and so on to the end of the column. 
By a Me а? are necessarily positive since each negative x is matched 
(50) “Bative fx. The sum of the fx? column (7978.00) divided by N 


fag 


Ie Gt) . aa a ie ba, ae oer ч. 
# 
w 
H 


_ pe (11) 
ENa 


D or c for data grouped into a frequency distribution) 


> + ў + 

eu 2 of Table 8, page 46 furnishes another illustration of the 
| DS Bion of c from oops data. In column (6), the ја? entries 
1 E obtained, as in the previous problem, by multiplying each. 


NS 2 i 


h en $ i 


, 


"Cal 


in а simple ungrouped | 
consider the example given on page 48, to illustrate the ~ 


(Баз e de 
9 — ANO C 
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х by its corresponding јх. The sum of the fx? column is 8927.29; 
and N is 200. Hence, applying formula (11) we get 6.68 as the SD. 
The standard deviation is less affected by sampling errors (p. 194) 

than is the Q or the MD and is a more stable measure of dispersion: 

. In a normal distribution the SD, when measured off above and Ко 
the mean, marks the limits of the,middle 68.26% (roughly the middle 
two-thirds) of the distribution.* This is approximately true also. 
the c in less symmetrical distributions. For example, in the first 
problem in Table 8 the middle 65% of the scores fall between score 
183 (170.80 + 12.63) and score 158 (170.80 — 12.63).; The SD 1% 
larger than the MD which is, in turn, larger than ©. These relation" 
ships supply a rough check upon the accuracy of the measures ° 
variability. 

Td | 

ІІ. Calculation of the SD by the Short Method 


1. Calculation of с from grouped data 


On page 37, the Short Method of calculating the mean was out 
lined. This method consisted essentially in “guessing” or assuming 


TABLE 9 The calculation of the SD, by the short method.} Data from 


Table 1. Calculations by the long method given for com’ 
рагізоп ' T. 


1. Shórt Method 


ау. @` GQ e (5) x 
Scores Маа f 29 fe! fa 
195-199 197 1 5 5 28 
190-194 192 2 4 8 20% 
im mg tit o3 d | 
5 2 
175-179 177 8 1 8 (+ 43) 8 
170-174 172 10 0 J g 
165-169 167 6 —1 —6 6 
160-164 162 4 -2- = 16 
155-159 157 4 =3 EN 36 
150-154 152 2 —4 -8 32 
145-149 147 3 -5 -15 75 
140-144 142 d rz —6 (= 85) .36 
N:= 50 98 322 


* See page 96. t 4 
+See page 71 for method of.caleulating the percentage of scores fall? 
between two points in a frequency distribution. ^ 


iThe calculation of the mean is repeated from Table 7. 
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NUR 
TABLE 9—{Continued) 


LAM=17200 с--12--20 с=—20х5=—120 


5 с 0576 ! 
ci = —120 
M = 170.80 
Efa 
>= NE ЛЕНІ 
N 50 
= 1263 
2. Long Method 

E: о ә 0 ©” et. 0 

ores ыраш f ух 4,2 fe уг 
196-190 26.20 26.20 686.44 
185- Ја 192 i 197 21.20 42.40 808.88 
180-184 187 4, 2748 16.20 64.80 1049.76 
190-184 182 ` 5 910 11.20 56.00 69720 
170-129 177 8 1416 690 .— 49.60 307.52 
0-74 172 10 1720 1L 12.00 1440 
160-169 167 8 1002 —380 —2280 8604 
ИРЕ 
150-154 157 4 608 — 18: — 55. 16176 

—1880  — 37.00 j 
5149 147 3 T О 7640 16932 
| 29 142 1 142 2880 — 28.80 82944 
М = 50. 80, — 502.00 7978.00 


м 28% _ 8540 uo 
50 ~ ғ 


N 
* 8D = VER 2, [797800 _ 1963 ; 
- N 50 ae . É 


З and later applying to this value a correction to give the 
"nean. "The Short Method шау also be used to advantage in 
Ung the Sp)» It is a decided time and labor saver m dealing 
sued data; and is well-nigh indispensable in the calculation 
i s Correlation table (p. 134). 
The e tort Method of calculating the 
Calo ago tation of the mean is repeated in the table, as \ 

ШЕ 101 of the mean and SD by the direct or Long Method. This 
| We affords а readier comparison of the two techniques. 


а meg, 
j Actual 
мша 
OF б), 
SD is illustrated in Table 9. 
as is also the 


у Short Method 
Du the assumed mean or Short Method. 
Ti is nes rarely apie seers the Short Method of calculation 
Sither very short nor very satisfactory) is not given. 
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The formula for computing с by the Short Method is 


NIS. 
=й = =@ a2 


(SD from a frequency distribution when deviations are taken fro" 
an assumed mean) 3 


in which Xfx’? is the sum of the squared deviations in units of class | 
interval, taken from the assumed mean, c? is the squared correctio? | 
in units of class-interval, and i is the class-interval. Я 
The calculation of с by the Short Method may be followed in detail | 
_ from Table 9. Deviations are taken from the assumed mean (172.0 
in units of class-interval and entered in column (4) as a^. In col 
umn (5) each 2’ is weighted or multiplied by its f to give the fx’; and. 
| in column (6) the fz^?'s are found by multiplying each а in colum’ 
(4) by the corresponding fz’ in column (5). Тһе process is identical. 
with that used in the Long Method except that the z^s are all ех 
pressed in units of class-interval. This considerably simplifies the 
multiplication. The calculation of c has already been described 0? 
page 38: c is the algebraic sum of column (5) divided by М. The 
Sum of the fx’? column is 322, and c? is .0576. Applying formula (12) 
We get 2.525 X 5 (interval) ог 12.63 as the c of the distribution 
Formula (12) for the calculation of c by the Short Method holds 
£ood no matter what the size of c, the correction in units of class 


interval, or where the mean has been assumed. 
% М 


^ 


2. Calculation of c from the original measures or scores 


2 It will often save time and labor to apply the Short Method for 
computing 9 directly to the ‘ungrouped scores. The method is illus- 
, trated in Table 10, ‘Note+that the ten scores are ungrouped, and that 
с itis not necessary even to arrange them in order of size. The assumed 
А © mean 1s taken at zero, and each score becomes at once a deviation 
(г?) from this AM, that is, each score (X) is unchanged. The correc- 
tion, с, is the difference between the actual mean (M) and the as 
sumed mean (0), ie, c — M — 0; hence c is simply M itself, The 
mean is calculated, as before, by summing the scores and dividing 
by N (see page 28). To find б, we square the 275 (or the X's whic? 
are the scores), sum them to get X(x’)? or XX?, divide by N, an 
P 
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SW 


TABLE 10. To illustrate the calculation of the SD from original scores 
when the assumed mean is taken at zero, and data are un- 


a 


Ж” 


grouped 
Scores (X) т (or X) (27)? or (X?) 
18 18 324 
25 25 625 
21 21 T “ЛАТ 
19 19 861 _ 
27 27 420! > 
31 31 961 
7 22 22 484 
25 25 625 
28 28 784 
20 20 -400 
236 236 E 
АМ -0 
M = 438 = 23.6 М-10 
с = 23.6 —0 
= 23.6 
с = 556.96 ^ 
c= МЕ #— (23.6)? X 1 (interval) 


= V 16.44 


НАР ЛЫМ ТЫЛЛАМАРР e —-— 
Б » 


su 
pact M?, the correction squared. The square root of the result 
9. A convenient formula is 
E 202 уш 
7 Ns 4 
Placing the M? by E , 
AN д B 
IMNES-GNU ‚+ (ш 
= т 2 


| (c calculated йй) original scores by the Short Method) 


ч * “44% ‘ E 
Nati, p thoq of calculating o is especially useful when there are 
presse ку Scores, say fifty ог less, and when the scores are E 

е m not more than two digits,* во that the squares do a 

: с i wi 
Stet] "nwieldy. A calculating machine and a table Жи, н 
à divi vcllitate computation. Simply sumrthe scores as they stan 
А ҚЫ 1% by N to get М. Then enter the squares of the scores 1n 
D p | i ients of cor- 
"рп, ап Plicati i leulation of coefficients х 

“чала Т scheme Sis ee oe ee the original scores 50 85 to 
` е need for handling large numbers, see page 143. 


Е. th 
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the machine in order, sum, and substitute the result in formula (18) 1 
or formula (14). 


3. Effect upon c of (a) adding a constant to each score, or (Б) multiplying 

each score by the same number 

(a) If each score in a frequency distribution is increased by some 
Set, amount, say 5, the o is unchanged. The table below provides а 
simple illustration. The mean of the original scores is 7 and c is 1.41. 
When each score is increased by 5, the mean is 12 (7 4- 5), but ø i$ 
still 141. Adding a constant (e.g., 5, 10, 15) to each score simply 
moves the whole distribution up the scale 5, 10, or 15 points. The 
mean is increased by the amount of the constant added, but the vari- 
ability (c) is not affected. If a constant is subtracted from each 
score, the distribution is moved down the scale by that amount; the 
mean is deereased by the amount of the constant, and c, again, 18 
unchanged. 


Origin paoros 2 2g Mie us A a 
9 2 4 14 2 4 
8 1 il 13 1 1 
, j^ 0 0 12 0 0 
6 егі 1 11 E 1 
5i = 4 10 -2 4 
585 10 5[60 10 
е” M= 12 


поз [10 
с = 5 1.41 с = TEN 1.41 


(b) What happens to the mean and c when each score is multi- 


plied by a constant is shown in the table below: 
Original scores (X) арла] 30218 4 ч P 
, 20 400 
9 | en 10 100 
5 S0 0 0 
2 60 -10 100 
5 _50 — 20 400 
88 5[350 1000 
M=7 pr P 
= 1,41 1 
т c = A = у = 1414 


Each score in the list of five, shown above, has been multiplied 


by 10; and the net effect of this operation is to multiply the mea? 
and the c by 10. 
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4. The c from combined distributions 
When two sets of scores have been thrown together into a single 


distribution, it is possible to calculate the с of the total distribution 
Ош the o’s of the two component distributions. The formula is 


a= ҚЫ (912 + dy?) + Ns (os? + 4?) (15) 
N 
(SD of a distribution obtained by combining two frequency 
22% distributions) 
= inwhich 
c, = SD of distribution 1 
с» = SD of distribution 2 
d, = (Mi — Mom») 
N dz = (Mz — M comb) 


o d М, are the numbers of cases in component distributions 1 
the АЗСА ыы and N = (N,-- Na). The Mey is the mean of 
7 n ne distribution got from formula (3), page 31. 
i Ero le will illustrate the use of formula (15) Я Suppose that іш 
80 of 25 children, the mean (Mi) of an achievement test is 
9; = 15; and that in a second class of 75 children, the 
М.) on the same test is 70 and өз = 25. What is the 
Roe total distribution of 100 cases? First, we find that Meomv 
| ONE D — 72.50. We have, then, that d, — (80 — 72.5) 
in fot, = 56:25; da= (70— 725) and 4/ = 625. Substituting 
; Tula (15) for оз, d», di, de, Ns, and Мә we find that 


Mean ( 


еы 


Ücomp = 


25225 F 56.25) -Е 75 (625 + 629) — 2; 
100 


Orm 
Ponent a (15) may easily be extended to include more than two com- 
distributions by adding Ns, оз, ds, and so оп. 


11. The Coefficient of Variation, V 


ti қ 

Sd хроп е often desirable to compare the variability of a given group 
two С 9 or more different tests; or to compare the variabilities of 
to kno, ere groups upon the same test. We may wish, for example, 
Whether 8-year-old girls are more variable in height than 
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in weight; or whether 10-year-old boys are more variable than 10: 
year-old girls in vocabulary or in memory span. The 0, MD, and SD 
are not suitable, ordinarily, for such comparisons. These measures 
give the absolute spread or dispersion of test scores around their 
means in terms of the units of the test. But owing to differences 1 
measuring units, we cannot compare the variability in height and the | 
variability in weight of a given group directly; nor can we compare | 
the relative variability in height of two groups, says boys and girls, 
unless the means of the two distributions are at least approximately 
equal. To enable us to tell whether one group is more variable than 
another, we need a measure which takes account both of the central “ 
tendency and of the variability of the group, and which is inde- 
pendent of the units in which ability is expressed. One such measure 

is the ratio c/M, called the coefficient of variation, or V. The for- 
mula for V is 


100 Xo 
_ 100 Xo 6 
Ver (16) 


(the coefficient of variation or coefficient of relative variability) * 


The following illustrations will make the use of the formula clear. ; 
Consider, first, the case where abilities are measured in different 
units. A group of 7-year-old boys has a mean height of 45 inche 
with a o of 2.5 inches; and a mean weight of 50 pounds with a o of | 
6.6 pounds. In which trait is the group more variable, height or 
weight? Since we cannot compare inches and pounds directly, it 19 

‚ impossible to answer this question by reference to the SD’s of the 
height and weight distributions. But we can compare the relative 


variability of the two distributions in terms of their coefficients 0 
variation. Thus, 


у= ee у. by (16) 2 
45 E ' 
and Va = OES = p by (16) 


from which it appears that these boys are 5.6/12 or 47% as variable | 
in height as in weight. ý 

Now let us consider the case where variability is measured in tbe 
same units, but around different points on the scale. At the end of ЙУ 
minutes, a group of 50 children had worked an average of 20.50 ех 


* The multiplier 100 is introduced for the idi ac 
tional results. Purpose OE avoiding malik il 


= 
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= 
ples correctly, the o being 5.24. At the end of ten minutes, the 


sam 
д Ni з worked an average of 34.80 examples correctly, the 
We Bou à - we compared the o’s of the two distributions directly, 
ees n ү, be inclined to conclude that the group was nearly 
end of the Es le at the end of the 10-minute period as it was at the 
9.62, This aera period, since the o has increased from 5.24 to 
ability Mui Y rs is correct as far as the absolute spread or vari- 
dispersion тақ е group 18 concerned. But to compare the relative 
өл лер he group іп the two periods, we must note that, with 
E in o, the means have also increased from 20.50 to 34.80. 
cients of variation give the following results: 


For the 5-mi eq _ 100 X 5.24 _ 
minute period: V =~ 2050 = 25.6 


WA ego КЕРЕК АСТ: _. 
minute period: V = — 9480 | 480 ^ 21.6 


Thus. ; 
as in Төлен of being about 50% as variable in the 5- 
Core ің s group is 25.6/27.6 or 98% as variable, W 
Я Xn EA as well as the absolute variability. _ 
Чуе Variabili as been raised * to the use of V in comparing the rela- 
D mental ility of test scores because the “true” zero point of ability 

and educational tests is unknown. This objection does not 


minute period 
hen the mean 


Ment; 5 


Exam l 
е. 
Suppose that we have given а voc 


9f chil 
dr 
“qual 20. Y and have obtained a mean of 


. the 

Kets a Words, and hence the mean score as well as every sub- 

Erou e Will be inereased by 30. The absolute variability of the 

a pies CA will however, remain unc 

"n ean ( С tly the same relative position as 
S anges RAMIS 25 to 55) without a corres 
азу ag Тош 20 to 9; and, since we could add 40 or 400 items as 


33 
While ТШ 212 appears to be a very unstable measure. 
eoretically correct, criticism of ү because of the arbitrary 


* 
15 атап, 
, Әбу ец 
NN R., “Statistical Issues,” Journal of Educational Psychology, 1924, 
"сар роле 
| Revisa: io HE Aon Zero in Intelligence Measurement; 
3 , 35, 175-197. 


” Psycho- 


- 
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meaning for the test as it stands, If the range of difficulty in the te 
is altered, or the units changed, 
therefore, is in a sense no more arbitrary than M, and the objection | 
to this measure can be directed with equal force against М. 

V is most useful, perhaps, in comparing the variability of a grou 
upon the same test administered under different conditions, as, 10! 
example, when a group of students works at a task with and without 
distraction. The zero point here, at least, remains substantially сол” 
stant. V may also be used to compare two or more grou 


pared in tests of logical memory or picture completion. In both 0 
these cases it is Probably justifiable to assume that the “true” ze! 
point of ability is sensibly the same for the groups compared. i 

Tt is, perhaps, most difficult to interpret V when the variability Ф 
ental tests is a matter of interest. If 7 | 
or variability in paragraph reading and dh 
t should be made plain that the V’s re a | 
upon which performance has been mei y 
5 and arithmetic may—and. probably y 7 
Cause of difference in test units, range 8 
est, and position of arbitrary zero E | 
of V to the particular measures which h 
nt will furnish useful information. 


IV. When To Use the Various Measures of Variability 
1. Usethe range 
(1) When the data 


tion of any oth 
(2) Whena knowle, 


аге too scant or too scattered to justify the calcul” 
er measure of variability. 4. 
dge of the total spread of scores is all that is want? 


MEASURES OF VARIABILITY + ы 


- Use the Q 


(1) For a quick, inspectional measure of variability. 


‘Cale 4 
Wate the с of the 25 ungrouped scores given 


n When there are scattered or extreme measures. 
(3) When the degree of concentration around the median is sought. 


` Use the MD 


m When it is desired to weight all deviations according to their size. 
) When extreme deviations should influence the measure of variabil- 
ity, but not influence it unduly. 


- Use the SD 


(1) ae the measure having the highest degree of reliability is sought 
p. 194), 
(2) When it is desired that extreme deviations have а proportionally 


(3) greater influence upon the measure of variability. — 
When coefficients of correlation or measures of reliability are subse- 


quently to be computed (p. 182). 


PROBLEMS 


3 с H B t H 
Шеше the 0 and с for each of the four frequency distributions given 


оп 

Page 40 under problem 1, Chapter 2. 
on page 24, problem 
It with the o’s cal- 


a А 
), taking the AM at zero. Compare your resu ‘ 
cores which you 


cul, 
тте from the frequency distributions of the same 8 
ed in class-intervals of three and five units. 


For t 
he following list of test scores, 4 
(a) в 52, 50, 56, 68, 65, 62, 57, 7 
(5) "m the M and o by method on page 55. 
c) Sul 6 to each score and recalculate M and с. 
(4) tract 50 from each score, and calculate M and o. 
Ultiply each score by 5 and compute М and o. 


bo Sample A (N = 150), M = 120 and o = 20; in Sample В. 
— 75), M = 126 and 2 29. What are the mean and SD of 
(b) m B when combined into one distribution of 225 cases? A 
fat are the mean and SD obtained by combining the following 
Tee distributions ? 


Distribution N k M 4 
I 20 60 8 
П 190 50 2 
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5. Calculate coefficients of variation for the following traits: 


Trait а, Group М 

Length of mms, 802 males 190.52 
Head 

Body Weight pounds 868,445 males 141.54 

‘Tapping М of 5 trials 68 adults, 196.91 
Speed 30” each male and female 

Memory No. repeated 263 males 6.60 
Span correctly 

General In- . Points scored 1101 adults 153.3 
telligence 

(Otis Group 


Intell. Scale) 


Rank these traits in order for relative variability. Judged by their Vis 
which trait is the most variable? which the least variable? which traits 
have true zeros? 


6. (a) Why is the Q the best measure of variability when there are scat- 
tered or extreme scores? 


(b) Why does the g weight extreme deviations more than does the MD? / 


ANSWERS 
1. (1) 0=ззз (2) Q =8.13 
o = 499 c = 11.33 
(3) Q = 4.50 (4) Q = 16.41 
o = 7.23 с = 24.13 
eL 


с of Ungrouped scores = 6.72 

© of scores grouped in 3-unit intervals = 6.71 

б of scores grouped in 5-unit intervals = 6.78 à 

3. (а) M — 60 (b) M — 66 (c) M — 10 (d) M — 300 
o= 6.01 o= 6.91 o= 691 б-- 3455 

4. (а) M= 122.0; с= 200 

(б) М = 48.00; 5 = 18.05 


5. V's in order аге 3.10; 12.50; 13.63; 17.12; 15.30. 
variability from most to least: V 


Memory Span; 
Tapping Speed; Weight; Hea 


General Intelligence; 
d Length. Last two 


traits һауе true zeros: 


‘YS 


at. ЭИ 


ч 
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CUMULATIVE DISTRIBUTIONS, GRAPHIC 
METHODS, AND PERCENTILES 


+ 


nc 


distribu- 
ti hapter 1, we learned how to represent the frequency 
io 


esent 
by means of the polygon and the histogram. hse С 
» other descriptive methods will be сое PE Rd 
quency graph, the cumulative Br wee te. given for 
simple graphical devices. Also, metho в И АШКЫ 
ing percentiles and percentile ranks from freq 
Ч directly from graphs. 


tive fre 
Certain 
Calculat 
long an 


l. The Cumulative Frequency Graph 


°nstruction of the cumulative frequency graph 


Ж 


a соу distribution by means of a diag 


| d histo- 
: polygon an 

Р and 18. The first two col- 
11 repeat Table 5, page 


4 “ 
ores have been “ac- 


а 

е 
із i 
hay, 
Sra; 
Ung 160ге 2, 4, and 5, pages 11, 17, 
30, ox Or each of the distributions in Table is 
ШІЛТЕН: but in the third DM i of the distribution 
d "I, Progresively from the bo ha scores the 
йүн ` To illustrate S distribution of SE» UP 
istrip a Rulative frequency” is 1; 1+3 ga 6: 6-I-4 —10, ete. 
The 1 ution gives 4 as the next entry; 41 ] 


: ual to 50 or N, the 
tota] m Cumulative frequency is, of course; 64 
Frequency. 


* 63 


x 
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TABLE 11 Cumulative frequencies f. 


or the two distributions given in 
Table 5, page 30 


Army Alpha Cancellation к 
Mery f Cum. f Scores f Cong 
195-199 1 50 135.5 to 139.5 3 200 
190-194 2 49 131.5 to 135.5 5 197 
185-189 4 47 127.5 to 131.5 16 192 
180-184 5 43 123.5 to 127.5 23 176 
175-179 8 38 119.5 to 123.5 52 153 
170-174 10 30 115.5 to 119.5 49 101 
165-169 6 20 111.5 to 115.5 27 52 
160-164 4 14 107.5 to 111.5 18 25 
15:159 4 10 108.5 to 107.5 7 7 
145-149 3 4 Am 
140-144 T il 
У = 50 


The two cumul 
tributions of Tab] 
the graph of 
vals of the q 


ative frequency graphs which represent the dis- 
€ 11 are shown in Figures 8 and 9. Consider first 
the 50 Army Alpha scores in Figure 8. The class-inter- 
istribution have been laid off along the X-azis. There 


50 


$ 
e 


Cumulative Frequencies 
ў 


1393 MAS. 149.5 1545 1595 1645 1695 ITAS Туз 1845 189.5 1945 199.5 
Scores 
FIG. 8 Cumulative frequency graph 

(Data from Table 11, above) 
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210 


m 
со 
© 


m 
e 
о 


о 
e 


Cumulative Frequencies 
о ка 
8 S 


с 
КЕ: 


103.5 107.5 111.5 115.5 119.5 123.5 127.5 131.5 135.5 139,5 
Scores 


FIG. 9 Cumulative frequency graph 
(Data from Table 11, p. 64) 


b. 12 intervals, and by the “75% rule" given on page 12 there 
P" Me be about 9 unit distances (each equal to one class-interval) 
ЫШЫ on the Y-axis, Since the largest cumulative frequency is 50, 
mat iat these Y-units should represent 50/9 or 6 scores (approxi- 
E Y). Instead of dividing up the total Y-distance into 9 units 
in төр esenting 6 scores, however, we have, for convenience 1n plot- 
he divided the total Y-distance into 10 units of 5 scores each. This 
in not change significantly the 3:4 relationship of height to width 

the figure, i | 

Val nn Plotting the frequency polygon the frequency, on each inter- 

5 taken at the midpoint of the class-interval. But in constructing 
at we ulative frequency curve each cumulative frequency 18 plotted 
We a Upper limit of the interval upon which it falls. This is because 
lative e dding progressively from bottom up and hence each cumu- 
The р Т°Ччепсу carries through to the upper limit of the interval. 
9n 1 "st point on the curve is one Y-unit (the cumulative frequency 
above e just above 144.5; the second point is 4 Y-units just 
last АШ the third, 6 Y-units just above 154.5, and so on to the 
Joine AU Which is 50 Y-units above 199.5. The plotted points are 
[o sha 9 give the S-shaped cumulative frequency graph. In order 
| (upp, © the curve begin on the X-azis it is started at 139.5 
BI * limit of 134.5 to 139.5), the cumulative frequency of which 


АР 
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The cumulative frequency curve in Figure 9 has been plotted from 
the second distribution in Table 11 by the method just described. 
The curve begins at 103.5, the lower limit of the first class-interval,” 
and ends at 139.5, the upper limit of the last interval; and cumulative 
frequencies, 7, 25, 52, etc., are all plotted at the upper limits of their 
respective class-intervals. The height of this graph was determined 
by the “75% rule" as in the case of the curve in Figure 8. There are 
9 class-intervals laid off on the X-axis; hence, since 75% of 9 is 7 
(approximately), the height of the figure should be about seven 
class-interval units. To determine the score value of each Y-unit 
divide 200 (the largest cumulative frequency) by 7 to give 30 (ap- 


proximately). Each of the 7 Y-units has been taken to represent 
30 scores. 


Ф ІІ. Percentiles and Percentile Ranks 


1. Calculation of percentiles in а frequency distribution 


We have learned (p. 31) that the median is that point in a fre- J | 
quency distribution below which lie 50% of the measures or scores; 
and that Q, and Qs mark points in the distribution below which lie, 
respectively, 25% and 75% of the measures or scores. In exactly thë 

_ Same way in which the median and quartiles are found, we may com- 
~ pute points below which lie 10%, 43%, 85%, or any percent of the 
Scores. These points are called percentiles, and are designated, in 
general, by the symbol P,, the p referring to the percentages of cases 


below the given value. Pio, for example, is the point below which lie 
10 76 of the scores; Pzs, the point below which lie 78% of the scores: 
Tt is evident t 


£ that the median, expressed as a percentile, is Pro; also i 
0118 Pas, and Qs is Prs. , » 


ў À 
The method of calculating percentiles is essentially the same as 
that employed in finding the median. The formula is 


N—F ү 
Р,-1-- i ГА ) Xi (interval) (17) 


(percentiles in а frequency distribution, counting from below wp) 
^ where 


р = percentage of the distribution wanted, e.g., 10%, 33%, ete. 
1 = lower limit of the class-interval upon which P, lies { 


* Or the upper limit of the interval just below, i.e., 99.5 to 103.5. 
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PN = part of N to be counted off in order to reach PS 
F = sum of all scores upon intervals below 1, ! 
fp = number of scores within the interval upon which P, falls 
2 = length of the class-interval 


In Table 12, the percentile points, Ріо to Poo, have been computed 
by formula (17) for the distribution of scores made by the fifty col- 
lege students upon Army Alpha, shown in Table 1, page 5. The 
details of calculation are given in Table 12. We may illustrate 


TABLE 12 Calculation of certain percentiles in a frequency distribution 


(Data are fifty Army Alpha scores, see Table 1, p. 5) 


== 
Scores 7 Cum. f Percentiles 
195-199 1 50 Рі = 1995 . 
190-194 2 49 „Ж 
185-189 4 47 Р» = 187.0 
180-184 5 43 Py = 1815 
175-179 sa 38 Py = 177.6 
170-174 10 30 u 1745 
165-169 6. 20 Рь = 172.0 
160-164 4 14 Po = 169.5 
155-159 4 10 Ру = 165.3 
150-154 2 6; Ру = 159.5 
125-140 3: 1 Py = 152.0 
144 1 
zn Po = 139.5 
CALCULATION OF PERCENTILE Рогхтв 
л : Я 
10% о/50 = 5 149.5 + (2 5 ) Х5-1520 
" s 
— 10 ! 
20% of 50 = 10 159.5 + (Шен) x 5 = 159.5 g ® 
ү 5 14) 
780% of 50 = 15 1645 + (5 5 )х5-1өз 
и — 20 
40% of 50 = 20 169.5 4- (% 10 ) х5 = 169.5 
50% ог 50 = 25 169.5 + (um 2) x 5 = 172.0 (Мап) Y 
шм ж ` і 


60% ог 50 = 30 174.5 + 


( x 
— 30 
70% of 50 = 35 174.5 + (= 4 ) Х5-177.6 » 


80% of 50 ="40 179.5 + 


x 1909 of 50 = 45 184.5 + 


» 


ЕА 
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the method with Pz. Here ФУ = 35 (70% of 50 = 35), and from the 
Cum. f we find that 30 scores take us through 170-174 up to 174.5, the 
lower limit of the interval next above. Hence, P;, falls upon 175- 
179, and, substituting pV = 35, F 
179), and i=5 (class-interval) in formula (17), we find that 
Py = 177.6 (for detailed calculation, see Table 12). This result 


exactly the same way as 


tions of the P, in Table 12 in order to become thoroughly familiar 
with the method, 


It should be noted that Po, which marks the lower limit of the ^ 


first interval (namely, 139.5) lies at the beginning of the distribution. 
Рі marks the upper limit of the last interval, and lies at the end of 
the distribution. These two percentiles represent limiting points, 
Their principal value is to indicate the boundaries of the percentile 
Scale. е 


2. Calculation of percentile ranks in a frequency distribution 


We have seen in the last section how percentiles, e.g., Pi; or Poo, 
may be calculated directly from a frequency distribution. To repeat 
what has been said above, percentiles are points in a continuous dis- 
tribution below which lie given percentages of N, We shall now 
consider the problem of finding an individual’s percentile rank (PR); 
or the Position on a scale of 100 to which the subject’s score entitles 
him. The distinction between percentile and percentile rank will be 
clear if the reader remembers that in calculating percentiles he starts 
With a certain percent of N, say 15% or 62%. He then counts into 
the distribution the given percent and the point reached is the re- 
quired percentile, €.£4 Рі or Pos. The procedure followed in comput- 
ing percentile ranks is the reverse of this process. Here we begin with 
an individual score, and determine the percentage of scores which lies 
below it. If this percentage is 62, say, the score has a Percentile rank 
or PR of 62 on a scale of 100. 

We may illustrate with Table 12. What is the PR 
scores 163? Score 163 falls on interval 160-164, There are ten scores 
up to 159.5, lower limit of this interval (see column Cum, f), and 
four scores spread over this interval, Dividing 4 by 5 Guten 
length) gives us .8 score per unit of interval. The score of 163, which 
We are seeking, is 3.5 score units from 159.5, lower limit of the inter- 


of a man who 
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within which the score of 163 lies. Multiplying 3.5 by .8 we get 
Ба” score-distance of 163 from 159.5; and adding 2.8 to 10 
«bro sa of scores below 159.5) we get 12.8 as the part of N lying 
W bel pa. Dividing 12.8 by 50 gives us 25.6% as that proportion of 

elow 163; hence the percentile rank of score 163 is 26. The dia- 
gram below will clarify the calculation: 


7-4 
. 8 
ў ——M 
UT Т 8 | 8 has |44 | BOE T 
У 9.5 160.5 161.5 162,5 163.5 164.5 
S af 163.0 


|р че lie below 159.5. Prorating the 4 scores on 160-164,over the 
| 84- * of 5, we have .8 score per unit of interval. Score 163 is just 
e bs + .8 + .4 or 2.8 scores from 159.5; or score 163 lies 12.8 scores 

6% (12.8/50) into the distribution. 


штен rank of 181 is 79 (verify it). The reader should note 
Ир; poe of 163 is taken as 163.0, midpoint of the score-interval 
= * be th о 163.5. This means simply that the midpoint is assumed to 
D DEN e representative value in a score-interval. The percentile 
instan Or several scores may be read directly from Table 12. For 

of 90 ei 152 has a PR of 10, 172 (median) a PR of 50, and 187 a PR 
the si ше take the percentile-points as representing approximately 
159.5 escis upon which they lie, the РЕ of 160 (upon which 
which pe is approximately 20 (see Table 12); the PR of 165 (upon 
mats 65.3 lies) is approximately 30; the PR of 170 is approxi- 
qud 40; of 175, 60; of 178, 70; of 182, 80. These PR's are not 
у accurate, to be sure, but the error is slight. 


~ 


| 111. The Cumulative Percentage Curve or Ogive 


l.c 
onstruction of the ogive 


е differs from the cumula- 
expressed as cumulative 
cumulative frequencies. 


tivo i cumulative percentage curve or ogiv! 
Percen oan graph in that frequencies are 
^ Table i of М on the Y-azis instead of as с 
(centage shows how cumulative frequencies can be turned into per- 
test Бо N. The distribution consists of scores made on a reading 
MItervals 25 seventh-grade pupils. In columns (1) and (2) class- 

and frequencies are listed; and in column (8) the /в have 


T ; к 
he PR of any score шау: һе found in the same way. For example, 
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TABLE 13 Calculation of cumulative percentages to upper limits of 
class-intervals in a frequency distribution 


(The data represent Scores on a reading test achieved ( 
у 125 Seventh-grade children) 


а) @) (3) ; (4) | 
Scores тр Cum. f Cum. Percent f - 
74.5 to 79.5 1 125 100.0 
» 69.5 to 74.5 3 124 99.2 
2 64.5 to 69.5 6 121 96.8 
' 59.5 to 64.5 12 115 92.0 | 
№ 94.5 to 59.5 20 103 82.4 ye 
49,54 54.5 36 83 66.4 
445 9740,5 20 47 87.6 
89.512.445 15 27 21.6 
34. ,5 6 12 . 9.6 
= 29.5 to 34,5 4 6 ds 
24; 5 2 2 s 
N = 125 + 
IFEA р 
Е ғ es 


2 

9 
= 
a 
= 
е 
= y 
> (SR 
ues Vnd 
5 HA 
Е — Score 47h 
З P or Q 

FIG. 


10 Cumulative Percentage curve or ogive plotted from the data E 
of Table 13, above 


т 


Nas 
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been cumulated from the low end of the distribution upward as 
described before on page 63. These Cum. f’s are expressed as per- 
centages of N (125) in column (4). The conversion of Cum. f's into 
cumulative percents can be carried out by dividing each cumulative 
Í by N; eg, 2-+125 = (016, 6-+ 125 = .048, and so on. A better 
method—especially when a calculating machine is available—is to 
determine first the reciprocal, 1/N, called the Rate, and multiply 
each cumulative f in order by this fraction. As shown in Table 13, 
the Rate is 1/125 or .008. Hence, multiplying 2 by .008, we get .016 
1.6% ; 6 X .008 = .048 or 4.846; 12 X .008 = .096 or 9.6%, etc. 
. The curve in Figure 10 represents an ogive plotted from the data 
m column (4), Table 13. Class-intervals have been laid off on the 
ЭА<аліз, and a scale consisting of 10 equal distances, each present- 
ШЕ 10% of the distribution, has been marked off on the Y-axis. The 
"St point on the ogive is placed 1.6 Y-units just above 29.5; the 


Second point is 48 Y-units just above 34.5, etc. The last point is | 


100 Y-units above 79.5, upper limit of the highest class-interval. 


e Computing percentiles and percentile ranks from (а) the cumulative 
Percentage distribution and from (b) the ogive 


‚ (а) Percentiles may be readily determined by direct: interpolation 
M column (4), Table 13. We may illustrate by calculating the 71st 
Pereentile. Direct interpolation between the percentages in ошый 
(4) gives the following: 


TOG лш quete E 
(given) Ж 82.4% of the distribution up to 59.5 


16.0% i ) 
ШЕ ist percentile lies 4.6% above 664%. By simple proportion, 


160 = or n= 26 5 = 1.4 (z is the distance of the 71st percentile 


P 54.5). The 71st percentile, therefore, is 54.5 + 1.4, or 55.9. м 

UA ertain percentiles can be read directly from column“ 2A i 
th ae for instance, that the 5th percentile is ME y 34.5; 
tile ће 22nd percentile is approximately 44.9; that the 38t ола. 
6 8 approximately 49.5; and that the 92nd percentile is exactly 
oft other way of expressing the same facts is to say that А 
64, о беуешіһ graders scored below 44.5, that 92% scored below 
> Ste, 


°гсеп{ е ranks may also be determined from Table 13 by inter- 


4 
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polation. Suppose, for example, we wish to calculate the e o 
score 43. From column (4) we find that 9.6% of the scores are belou 


39.5) + 8.4% (from 39.5 to 43.0) comprise 18% of the distribution, 
this percentage of N lies below score 43. Hence, the PR of 43 is 18. 
See detailed calculation below. 


Score 43.0 
(given) 


12.0% 75.0 
Score 43.0 is 3.5/5 x 12.0% or 8.4% from 39.5; hence Score 43.0 is 
| 8.4% or 18.0% into the distribution. А 
d a ond de noted i the cumulative percents in column (4) give 
the PR’s of the upper limits of the class-intervals in which the scores 
have been tabuláted. The PR of 74.5, for example, is 992; of 64.5, 
92.0; of 44.5, 21.6, еіс. These PR's are the ranks of given points in 
the distribution, and are not the PR’s of scores. А s 2 
(b) Percentiles and percentile ranks may also be determined y 
quickly and fairly accurately from t] 
tribution plotted in Figure 10. To obt. е 
ple, draw a line from 50 on the Y-scale parallel to the X-axis and 
where this line cuts the curve dro 
This operation will locate the Median at 51.5 
exact median, calculated from Table 13, page 70, is 51.65. Q, and 
Qs are found in the same Way as the median. Po, or 01 


Over to the curve, 
mately at 54, 


Score 71 on the y. “axis, going verti- 


ross to the Y-axis to locate 
scale. The PR of score 47 
is found in the same way t i ; 


* 


іп T ; : 
СЫ that the boys score consistently higher than the girls. Differences 
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It will be noted that percentiles and percentile ranks are usually 
slightly in error when read from an ogive. If the curve is carefully 
drawn, however, the diagram fairly large, and the scale divisions pre- 
cisely marked, percentiles and PR’s may be read to a degree of 
Accuracy sufficient for most purposes. 


3. Other uses of the ogive 


(1) comparison oF GROUPS , 

A useful over-all comparison of two or more groups is provided 
When ogives representing their scores on a given test are plotted upon 
the same coürdinate axes. An illustration is given in Figure 11, 
Page 74, which shows the ogives of the scores earned by two groups 
of children—200 ten-year-old boys and 200.ten-year-old girls—upon 
ап arithmetic reasoning test of 60 items. Data from which these 
°&lves were constructed are given in Table 14. 


TABLE 14 Frequency distributions of the scores made by 200 ten-year- 
old boys and 200 ten-year-old girls on an arithmetic reason- 


ing test 

y i Smoothed z Sm oneg 
S B 8 : Girls um. um. 

Cores з Cum. f бв doy | Cum.f ору percent 

a age f age 
50-4 0 200 1000 1000 0 200 1000 1000 
52-59 2 200 1000 997 1 200 1000 998 
3054 — 325 198 990 952 0 19 995 907 
4244 48 Із 805 527 9 199 995 980 
30-0. 47 125 625. BT 27 19 950 920 
90 19 78 90 437 44 6 815 187 
E34. 5. жу 295 — 282 43 19 505 597 
32290 15.439 165 188 40 76 380 385 
104 9 18 90 100 10 36° 180 930 
16-19 2 a 9. дЫ 5 20 2 130 120 
Ету 2 2 10 18 1 6 3.0 62 
53 0 0 0 3 2 5 25 23 
ve 0 0 0 0 8 3 15 13 
200 200 0 5 
Rate = —= .005 


Several interesting observations can be made from Figure 11. The 
БУЗ” ogive lies to the right of the girls’ over the entire range, show- 


achievement as between the two groups are shown by the distances 
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separating the two curves at various levels. It is clear that differ- 
ences at the extremes—between the very high-scoring and the very 
low-scoring boys and girls—are not so great as are differences over 
the middle range. A more detailed analysis of the achievement of 
these two groups comes out in a comparison of certain points in the 
distribution. The boys’ median is approximately 42, the girls’ 32; 
and the difference between these measures is represented in Figure 11 
by the line AB. The difference between the boys’ Q, and the girls’ Q: 
is represented by the line CD; and the difference between the two 
Оз’ is shown by the line EF. It is clear that the groups differ more 


at the median than at either quartile, and are farther separated at, 


з than at 0. 


Cumulative Percents 


45 95 145 195 245 295 345 395 445 495 545 595 


Scores 


FIG. 11 Ogives representing scores made By 200 boys and 200 girls 
on an arithmetic reasoning test 


(See Table 14, page 73) 


| 


4 


га 
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approxi . 
Pproximately 12% of girls exceed the median of the boys in arith- 


NU oue Сеш overlap from boys to girls, we find that 
aa line me y 7696 of the boys exceed the girls’ median. The verti- 
mately Үшін А (girls’ median) cuts the boys’ ogive at approxi- 
Ev EM 4th percentile. Therefore 24% of the boys fall below the 
may be med and 76% are above this point. Still another illustration 
age of ae ШЕ Suppose the problem is to determine what percent- 
answer js f girls score at or above the boys' 60th percentile. Тһе 
through ana by locating first the point where the horizontal line 
cM cuts the boys' ogive. We then find the point on the girls’ 
I Y above this value, and from here proceed horizontally 
of the girl ocate the percentile rank of this point at 93. Since 9376 
Us UN fall below the boys' 60th percentile, about 796 score above 


RA PERCENTILE NORMS 
Performa FEN pes of achievement which represent the typical 
Year-olq ов of a designated group or groups. The norm for 10- 
ity Xj oys in height, and the norm for seventh-grade pupils in 
RU, аер is usually the mean or the median for the group. 
Other poi шау be much more detailed and may be reported. for 
Percent; nts in the distribution as, for example, Qi, Qs, and various 

iles, 

dealing with educational 
to evaluate and compare 
f subject-matter 
63 on an achievement test in 
1 t test in English, we 


Way of knowing from the scores alone W 
good, medium, or poor, or how 


IBS 
arith 68А compare. If, however, we know that а S 
6 


Ww , 3 Д ің > 
(5997. may say at once that this student 15 average in arithmetic 


Score 9f the students score lower than he) and good 
pow him), 
. “Teentile norms may be determine 


1 
0; lVes of score distrib ti Fi é 
Elves istributions. Figur 


Ment iş 


d directly from the smoothed 
12 represents the smoothed 


т Tal the two distributions of scores in arithmetic reasoning given 


614. Vertical lines drawn to the base line from points on the 
In Table 15 below, selected 


test have been tabulated 


Rive 1 
Бекет the various percentile points. 
lle norms in the arithmetic reasoning 


к”. 


ре. 


ма”, М 
(c 


иеле 
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4 


TABLE 15 Percentile norms for arithmetic reasoning test (Table 14) 
obtained from smoothed ogives in Figure 12 


Girls Boys 
Cum. %’s Ogive Calculated Ogive Calculated 

99 52.0 49.0 57.5 54.5 

95 46.5 44.5 54.5 52.9 

90 43.5 42.7 52.5 50.9 
80 40.0 39.2 49.0 48.1 
70 37.0 36.9 46.5 46.1 

60 35.0 34.6 44.0 44.0 

50 32.5 32.5 41.5 41.8 
40 30.0 30.0 39.0 39.7 ) 
30 27.0 27.5 35.0 34.8 РЕ 
20 23.5 25.0 30.0 30.9 Е 
10 18.5 18.0 24.5 25.2 

5 14.0 15.5 19.5 20.1 

1 4 8.5 3.3 6.5 14.5 


for boys and girls separately. This table of norms may, of course, be 
extended by the addition of other intermediate or extreme values. 
Calculated percentiles are included in the table for comparison with 
percentiles read from the smoothed ogives. These calculated values 
are useful as a check on the graphically determined points, but ordi- н 
narily need not be found. 7; 


2 


Cumulative Percents 


Scores E. 
FIG. 12 Smoothed ogives of the scores іп Table 14 


ie 


x 
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E evident that percentile norms read from an ogive are not 
Bh y accurate, but the error is slight except at the top and bottom 
M distribution. Estimates of these extreme percentiles from 
caleul ed ogives are probably more nearly true values than are the 
E. si points, since the smoothed curve represents what we might 
m. to get from larger groups or in additional samplings. ; 
kink € ogives in Figure 12 were smoothed in order to iron out minor 
ess TM irregularities in the curves. Owing to the smoothing proc- 
inal hese curves are more regular and continuous than are the orig- 
ogives in Figure 11, page 74. The only difference between the 
eo of smoothing an ogive and smoothing a frequency polygon 
a 14) is that we average cumulative percentage frequencies in the 
Ve instead of actual frequencies. Smoothed percentage frequencies 


M e 

те given in Table 14. The smoothed cumulative percent frequency 

i 16.5 +90 +45 
be plotted above 24.5, boys’ distribution, 18 165 51:9.) 15 vor 


3 
po: . . 38.0-+ 18.0 +13.0 
0; for the same point, girls’ distribution, Ad err RD 
2 . а H 
50. Care must be taken at the extremes of the distribution 
- the procedure is slightly different. In the boys' distribution, 
example, the smoothed cumulative percent, frequency at 9.5 is 


оо 0-+ 100.0 

оо < 99.0 + 100. { 

At nf or .396, and at 59.5, it 1s se ic; ti m. 
9 and 64.5, b ich li tside the boys’ distri ution, 

the о oth of which lie outs суе? 100-1 100 
umulative percentage frequencies are RT 


E со ‚ respectively. Note that the smoothed ogive ех- 
ae One interval beyond the original at both extremes of the 

Istribution. 

i here is little justification for smoothing an ogive which is Ше 

a regular or an ogive which is very : agged and irregular. x the 
ng stance, smoothing accomplishes little if anything; in the sec- 
тірі % may seriously mislead. A smoothed curve shows what wa 
n 10 expect to get if the test or sampling, or both, were аео 
eve Perhaps better) than they actually were- Smoothing shou 
an in be a substitute for getting additional data or for constructing 
Seep test, It should certainly be avoided when the group 18 
Wh and the ogive very irregular. Smoothing is perhaps most useful 

en the ogives show small irregularities here and there (see Fig- 
) which may reasonably be assumed to have arisen from small 
not very important factors. 


Ure 11 
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IV. Other Graphical Methods 


Data obtained from many problems in mental measurement, espe- 
cially those which involve the study of changes attributable to 
growth, practice, learning, and fatigue, may be treated profitably 
by graphical methods, Two widely used devices are the line graph, 
frequently found in experimental psychology, and the bar diagram 
more often met with, perhaps, in education. These two methods will 
be described in this section. 


І. The line graph 


Figure 13 shows an age-progress curve. This graph represents the 
change in “logical memory for a connected passage" in boys and 
girls from 8 to 18 years old. Norms for adults are also included 
on the diagram, Age is represented оп the horizontal or X-axis 

Y 


Number of Ideas 
ын 8 8 5 


нх 
12 18 14 15 16 17 18 Adults 
Age 


FIG. 13 Logical memory. Age is representeg on. X-line (horizontal); 
Score, i.e., number of ideas remembered, оп Y-line (vertical) 
(After Pyle) 


D OO ii 


and “average number of ideas reproduced” 
marked off on the vertical or Ү-алїз. Memor 


is a small but consistent sex difference throughout, the girls being 
higher on the average at each age. 


Figure 14 illustrates the learning or practice curve, These curves 


“ы 


м 


у 2 forty. ej 
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80 


Letters per Minute 
о 
© 


Б": 


0 4 8 1216 20 24 28 32 36 40 4 48 
Weeks of Practice 


FIG. 14 
Weeks of practice on X-line; 


Improvement іп telegraphy- 
number of letters per minute on Y-line 


(After Bryan and Harter) 


Show с 

Sages the improvement, in sending and receiving telegraphic mes- 
» resulting from successive trials at the same task over a period 

ght weeks. Improvement as measured by the number of let- 


terg 8 
ent or received per minute is indicated along the Y-axis. Weeks 


of praet 
th Actice at the given task are represented by equal intervals on 


^ -Qxts, 
lgur ; 4 
6 15 is a performance or practice “curve.” It represents 


tent: 
Y-five successive trials with the hand dynamometer made by 


Grip in Kgs. 


Y 
"IG, 15 
Srips Hand dynamometer readings in kilogram 
at intervals of ten seconds. Two subjects, а man and a woman 
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one man and one woman. A marked sex difference in strength of 
grip is apparent throughout the practice period. Also as the experi- 
ment progressed a tendency to fatigue is evident in both subjects. 


Figure 16 is Ebbinghaus’ well-known “curve of retention.” This - 


curve represents memory retention as measured by the percentage 9 
the original material retained after the passage of different tim 


Percent Retained 
a 
о 


2 144 hr, 
lhr.9hr. 24hr. 48 hr. 


Time between Learning and Relearning 


: ive hours 
FIG. 16 Curve of retention. The numbers on the baseline m 
elapsed from time of learning; numbers along Y-axis give 
percent retained 


B А * : j re 
intervals. The time intervals between learning and relearning а 


laid off on the X-avis; and the percent retained, as measured by 
relearning, on the Y-azis. 


2. The bar diagram 


The bar graph is sometimes used in psychology to compare the 
relative amounts of some attribute (height, intelligence, educational 
achievement, etc.) possessed by two or more groups. In education 
the bar graph may be used to com 


; distribution of student time 
s by states or districts; relative 
expenditures for various purposes. A common form of the bar 
graph is that in which a set of bars is used, the lengths of the bars 
being proportional to the amounts of the variable possessed. For 
emphasis, a space is usually left between the bars, which are drawn 
side by side and may be either vertical or horizontal, 


cu 
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oe bar graph is shown in Figure 17. These bars represent 
(рей ah age of officers in various branches of the military service 
ma N orld WarI who received grades of А and B or C upon the 

y Alpha Examination. The bars are arranged in order, the group 


c 
The bars represent the pe 
g A's and B's or C's 


FIG 
d Comparative bar graphs. rcentage 


in еа ivisi H .. 
ch division of the military service recelvin 


School A 


Freshmen 
38% 
School В 
Freshmen 
45% 


“с 
in at Divided bar graphs. The two bars repre 

а 2% igh schools. Each bar is divided into four 
‘Vision shows the proportion or percentage о 


Seniors 
14% 


Juniors |Seniors 
16% 9% 


sent student enrollment 
divisions. The length 
f students in that class 


Sophomores 


"X 
81% 


Sophomores 
80% 
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receiving the highest percent of A's and B's being placed at the top ' 
It is elear from the diagram that the Engineers, who ranked first, - 
received about 95% A's and B's and about 5% C's. The Veterinary | 
Corps, which ranked lad received about 60% A's and B's and | 
40% Cs. ] 

Another illustration of a bar graph is shown in Figure 18. The two | 
parallel rectangles or “bars” represent student enrollment in two city | 
high schools. Each bar is divided into four parts to represent fresh- 
men, sophomores, juniors, and seniors. The size of а division is pro- 
portional to the percentage which each class is of the whole group: 
This type of graph is often called a divided-bar graph. 


PROBLEMS 


1. The following distributions represent the achievement of two groups, 

A and B, upon a memory test. 

(a) Plot cumulative frequency graphs of Group A’s and of Group B's; - 
scores, observing the 75%, rule. « 

(b) Plot ogives of the two distributions A and B upon the same axes. 

(c) Determine Рр, Poo, and Poo graphically from each of the ogives and ' Я 
compare graphically determined with calculated values. 

(d) What is the percentile rank of score 55 in Group A’s distribution? 
In Group B’s distribution? 


(е) A percentile rank of 70 in Group A corresponds to what percentile 
rank in Group В? 


(f) What percent of Group A exceeds the median of Group B? 


Scores Group A Group B PA. 
79-83 6 8 " 
74-78 7 8 4 
69-73 8 9 7 
64—68 “ 1 
59-63 » i a 
54-58 15 18 

. 49-53 23 19 
44-48 16 11 
89-43 10 13 
84-38 12 8 
29-33 6 7 
24-28 3 2 
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2; | : 
А Construct an ogive of the following distribution of scores. - 


Scores Í 
159.5 to 169.5 ә 1 1 
149.5 to 159.5 5 
139.5 to 149.5 13 
129.5 to 139.5 45 
119.5 to 129.5 40 
109.5 to 119.5 30 

v^ 99.5 to 109.5 51 
89.5 to 99.5 48 
79.5 to 89.5 36 « 
69.5 to 79.5 10 
59.5 to 69.5 5 
49.5 to 59.5 1 
N= 885 


Read off percentile norms for the cumulative percentages: 
3. aive 99, 95, 90, 80, 70, 60, 50, 40, 30, 20, 10, Brand ДЕ 
the Du ЗЕН data from five cities in the United States, represent 
phically by means of a bar graph. 
Percent of population which is 


City Native White Foreign-born White Negro 

A 65 30 05 

E 60 10 30 

© 50 45 05 

р 40 20 40 

-. ES 30 10 60 

ANSWERS 
Group A Group B 
aN Cay р Оте *" Cal. Ogive Cal. 
30 46.0 45.81 48.5 48.60 
Poo 56.0 55.77 59.75 59.85. 


74.0 73.64 75.5 7481 


(0 62 (уу 39-40% of Group A exceed the median of Group B. 


1 | a Read from Brive: 
ч Pen 5 Burcen ae Go) 105 100. 807 ТОБО SP уыш ый 
ntiles: — 4159 1425 1375 1915 1245 1165 107 102 965 


20 10 5 1 
91 82.5 79 64.5 


He cO (0 - 


сл 


10. 


© 


10. 


. Describe the characteristics of those distributions for which the mean 
. When is it inadvisable to use the coefficient of variation? 


- What is a multimodal distribution? s 
. A student writes in a theme that by the application of eugenics it 


- Why cannot the с of one test usually be compared directly with the 6 
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ADDITIONAL PROBLEMS AND QUESTIONS ON CHAPTERS 1-4 


is not an adequate measure of central tendency. 


would be possible to raise the intelligence of the race, so that 11019 
people would be above the median I.Q. of 100. Comment оп this. 


of another test? 


. What effect will an increase in М probably have upon Q? 
. What is the difference between a percentile and the ordinary percent 


grade used in school? 


. Does a percentile rank of 65 earned by a given pupil mean that 65% 


of the group make scores above him; that 65% make the same score; 
or that 65% make scores below him? 


. Calculate the mean, median, mode, Q, and SD for each of the follow- 


ing distributions: 


(1) Scores f (2) Scores Í (8) Scores Í , 
90-99 2 14-15 3 25 А 
80-89 12 12-13 8 24 4 
70-79 92 10-11 15 23 i 
60-69 20 8-9 20 22 E 
50-59 14 6-7 10 21 
40-49 4 4-5 4 20 2 
30-39 1 A an 19 Д 

М = 75 N=% 


(a) Plot the distribution in 9 (1) as a frequency polygon and his- | 
togram upon the same coördinate axes. ; 


(5) Plot the distribution in 9 (2) as an ogive. Locate graphically the f 
the median, Q;, and Оз. Determine the PR of score 9; of score 12. 


ANSWERS 


Lj 
(1) Mean = 68.10 (2) Mean = 9.93 


(3) Mean = 22.04 


Median = 68.75 Median = 9.10 Median = 22.06 
Mode = 70.05 Mode = 8.84 Mode = 22.10 
0- 901 Q = 1.69 2 olf 
SD — 12.50 SD — 248 SD= 13944 


(b) Median — 9.0; Q, — 7.5; 09 


= 11.0 (Read from ogive) 
PR of 9 = 50; of 12 = 845 


| 
W 


| 
| 


p 


32 


THE NORMAL PROBABILITY CURVE 


* 


* 


1. The Meaning and Importance of the Normal 
Probability Distribution 


ГЕ Introduction 


Ww а Figure 19 are four diagrams, two polygons and two histograms, 

y EM oen frequency distributions of data drawn from anthro- 
Noni psy chology, and meteorology. It is apparent, even upon 
era] ma examination, that all of these graphs have the same gen- 
and t rm—the measures are concentrated closely around the center 
Tight je off from this central high point or crest to the left and 

e Ral here are relatively few measures at the “Jow-score” end of 

ton: a e; an increasing number up to a maximum at the middle posi- 

? and a progressive falling-off toward the “high-score” end of the 


IQ 60 80 ido 120 140 


fitting normal curve, ages 21 to 18. 


1. Form LI RE” 
.Q. distrib d best- 
О. distribution апа sion of the Stanford-Binet Scale, p. 19) 


(from McNemar, Quinn, The Revision 


85 


86 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


25 


Frequencies ь 


e 


2 4 6 8 i grt 16 
Digit Span 
2. Memory span for digits, 123 adult women students, (After Thorndike. 


8 


Б Inch 
Tite Еа 
56:75 


58. 60 62 64 66 68 70 79 74 76 78 
Stature in Inches 


3, Statures of 8585 adult males born in the British Isles, (After Yule.) 


а] 
a 
838 


Frequency per Yo Inch Interv: 
8 8 
с 


о 


295 300 30.5 
4. Frequency distribution of mis Anche 


ns. (After fu Southampton: 4748 


‘awn from different fields 
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—30 -2c -la 0 lo 26 3с 
Mean 


FIG. 20 Normal probability curve 


Scale. If we divide the area under each curve (the area between the 
curve and the X-axis) by a line drawn perpendicularly through the 
central high point to the baseline, the two parts thus formed will be 
Similar in shape and very nearly equal in area. It is clear, therefore, 
that each figure exhibits almost perfect bilateral symmetry. The 
Perfectly symmetrical curve, or frequency surface, to which all of 
the graphs in Figure 19 approximate, is shown in Figure 20. This 
bell-shaped figure is called the normal probability curve, or simply 
the normal curve, and is of great value in mental measurement. An 
understanding of the characteristics of the frequency distribution 
Tepresented by the normal curve is essential to the student of experi- 
Mental psychology and mental measurement. This chapter, there- 
fore, will be concerned with the normal distribution, and its frequency 
Polygon, the normal probability curve. 


2. Elementary principles of probability 


Perhaps the simplest approach to an understanding of the normal 
Probability curve is through a consideration of the elementary prin- 
Ciples of probability. As used in statistics, the "probability" of a 
Siven event is defined as the expected frequency of occurrence of this 
event among events of a like sort. This expected frequency of occur- 
Tence may be based upon a knowledge of the conditions determining 
the Occurrence of the phenomenon, as in dice-throwing or coin-toss- 
Ing, or upon empirical data, as in mental and social measurements. 

The probability of an event may be stated most simply, perhaps, 
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as a ratio. We know, for example, that the probability of an un- 
biased coin falling heads is 1/2, and that the probability of a die 
showing a two-spot is 1/6. These ratios, called probability ratios, 
are defined by that fraction the numerator of which equals the 
desired outcome or outcomes and the denominator of which equals 
the total possible outcomes. A probability ratio always falls between 
the limits .00 (impossibility of occurrence) and 1.00 (certainty of 
occurrence). Thus the probability that the sky will fall is .00; that 
an individual now living will some day die is 1.00. Between these 
limits are all possible degrees of likelihood which may be expressed 
by appropriate ratios. 

Let us now apply these simple principles of probability to the 
specific case of what happens when we toss coins.* If we toss one 
coin, obviously it must fall either heads (H) or tails (T) 100% of the 
time; and furthermore, since there are only two possible outcomes 
in a given throw, a head or a tail is equally probable. Expressed 
as a ratio, therefore, the probability of H is 1/2; of T 1/2; and 


(H+ T) =1/2+1/2=1.00 


If we toss two coins, (a) and (b), at the same time, there are four 
possible arrangements which the coins may take: 
(1) (2) (3) (4) 

a b a b a b 
Н: "Е ЖСН т Т 
Both coins (а) апа (5) may fall H; (а) шау fall Н and (5) Т; 
(b) may fall Н and (a) Т; or both coins may fall Т. Expressed as 
ratios, the probability of two heads is 1/4 and the probability of two 
tails 1/4. Also, the probability of an HT combination is 1/4, and of 
a TH combination 1/4. And since it ordinarily makes no difference 
which coin falls H or which falls T, we may add these two ratios (or 
double the one) to obtain 1/2 as the probability of an HT combina- 
tion. The sum of our probability ratios is 1/4 + 1/2 + 1/4 or 1.00. 

Let us go a step farther and increase the number of coins to three. 


H 
Ho 


If we toss three coins (а), (b), and (с) simultaneously, there are 
eight possible outcomes: 
а) (2) @) (4) 5) 6) 
DAE LC Н dcm e M 9 ® 


Expressed as ratios, the probability of three heads is 1/8 (combina- 

tion 1); of two heads and one tail 3/8 (combinations 2, 8, апа 4); 
* Coin-tossing and dice-throwing furnish елей di 

illustrations of the so-called “laws of chance? ny қаралы oos 


Be 
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of one head and two tails 3/8 (combinations 5, 6, and 7) ; and of three 
tails 1/8 (combination 8). The sum of these probability ratios is 
1/8 + 3/8 + 3/8 + 1/8, or 1.00. 

By exactly the same method used above for two and for three 
coins, we can determine the probability of different combinations of 
heads and tails when we have four, five, or any number of coins. 
"These various outcomes may be obtained in a somewhat more direct 
way, however, than by writing down all of the different combinations 
which may occur. If there are n independent factors, the probability 
of the presence or absence of each being the same, the “compound” 
probabilities of the appearance of various combinations of factors 
will be expressed by expansion of the binomial (p-+q)". In this 
expression p equals the probability that a given event will happen, 
q the probability that the event will not happen, and the exponent n 
indicates the number of factors (e.g., coins) operating to produce the 
final result." If we substitute H for p and Т for q (tails = non- 
heads), we have for two coins (Н + T)*; and squaring, the binomial 
(H +T)? = Н? J-2HT 4- Т. This expansion may be written, 


1 H2 1 chance in 4 of 2 heads; probability ratio = 1/4 

9 HT 2 chances іп 4 of 1 head and 1 tail; probability ratio — 1/2 

1 T? 1chance in 4 of two tails; probability ratio -- 1/4 
Total = 4 


These outcomes are identical with those obtained above by listing 
the three different combinations possible when two coins are tossed. 
If we have three independent factors operating, the expression 
(р +- а)” becomes for three coins (H+ T)’. Expanding this bi- 
nomial, we get H? + 3H?T + 3HT? + T°, which may be written, 


1 Нз 1 chance in 8 of 3 heads; probability ratio = 1/8 
3 H2T 3 chances іп 8 of 2 heads and 1 tail; probability 
ratio = 3/8 
3 НТ? 3 chances іп 8 of 1 head and 2 tails; probability 
ratio = 3/8 
1 T3 1 chance in 8 of 3 tails; probability ratio — 1/8 
Total — 8 


Again these results are identical with those got by listing the four 
different combinations possible when three coins are tossed. 


* We may, for example, consider our coins to be independent factors, the 
Occurrence of a head to be the presence of a factor and the occurrence of a tail 
e absence of a factor. Factors will then be “present” or “absent” in the vari- 

us heads-tails combinations. 
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(Тһе binomial expansion may be applied still more generally to ^ 


those cases in which there are a larger number of independent factors 
operating. If we toss ten coins simultaneously, for instance, we have 
by analogy with the above, (p + 4)19. This expression may be writ- 
ten (H+ T)!5, Н standing for the probability of a head, T for the 
probability of a non-head (tail), and 10 for the number of coins 
tossed. When the binomial (H +T)” is expanded, the terms are 


Нз -- 10Н°Т + 45Н°Т° + 120H*T? + 210H9T* + 252H5T5 4 210H4T* 
4-120Н277 + 45H?T5 -H 10HT? + T10 


which may be summarized as follows: 


Probability 
Ratio 
1H» 1 chance in 1024 of all coins falling heads тзт 
10 Н?Ті 10 chances in 1024 of 9 heads and ltail... 4494 
45 H*T* 45 chances in 1024 of 8 heads and2tails.. тїт 
120 H"T* 120 chances in 1024 of 7 heads and 3 tails.. 73% 
210 H*T! 210 chances in 1024 of 6 heads and4tails.. 9. 
252 H*T5 252 chances in 1024 of 5 heads and 5 tails.. #52; 
210 НиТ 210 chances in 1024 of 4 heads and 6 tails.. 5229; 
120 НТ” 120 chances in 1024 of 3 heads апа 7 tails.. 23% 
45 H?T* 45 chances іп 1024 of 2 heads and 8 tails.. тіп 
10 HT? 10 chances in 1024 of 1 head and 9 tails. . - rode 
SPs 1 chance in 1024 of all coins falling tails. . Tor 
Total = 1024 
These data are represented graphically in Figure 21 by a histogram 
and frequency polygon plotted on the same axes. The eleven terms of 
the expansion have been laid off at equal distances along the X-axis, 
and the “chances” of the occurrence of each combination of H’s and 
T's are plotted as frequencies on the Y- 
metrical frequency polygon with the greatest concentration in the 
center and the “scores” 
above and below the ce 
a to be expected theoretically when ten coins are tossed 1024 
imes. 
Many experiments have been conducted in which coins were 
tossed or dice thr 


, 
four-, five-, and six-spot 
* Weldon’s experiment; see Yule,!G, U. 


Statistics (London: C. Griffin and Со., 1932), Toth тообо to the Theorie 
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FIG. 21 Probability surface obtained from the expansion of (Н + Тро 


combination was taken as a “success” and each one-, two-, and three- 
Spot combination as a “failure.” Hence the probability of success 
and the probability of failure were the same. In a throw showing 
the faces 3, 1, 2, 6, 4, 6, 3, 4, 1, 5, 2, and 3, there would be five suc- 
cesses and seven failures. The observed frequency of the different 
numbers of successes and the £heoretical outcomes obtained from the 
expansion of the binomial expression (р + q)? have been plotted on 
the same axes in Figure 22. Тһе student will note that the observed 
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Comparison of observed and theoretical results. in. throwing; 


twelve dice 4096 times 
(After Yule.) 
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frequencies correspond quite closely to the theoretical except for a 
tendency to shift slightly to the right. If, as an experiment, the 
reader will toss ten coins 1024 times his results will be in close agree- 
ment with the theoretical outcomes shown in Figure 21. 

Throughout the discussion in this section, we have taken the prob- 
ability of occurrence (e.g., Н) and the probability of non-occurrence 
(non-H or T) of a given factor to be the same. This is not a neces- 
sary condition, however. For instance, the probability of an event’s 
happening may be only 1/5; of its not happening, 4/5. Any probabil- 
ity ratio is possible as long as (p+ q) = 1.00. But distributions 
obtained from the expansion of (р +q)” when р is not equal to q are 
“skewed” or asymmetrical and are not normal (p. 116). 


3. Use of probability curve in mental measurement 


Тһе frequency curve plotted in Figure 21 from the expansion of 
the expression (Н + T)!? is a symmetrical many-sided polygon. If 
the number of factors (e.g., coins) determining this polygon were 
inereased from 10 to 20, to 30, and then to 100, say (the baseline 
extent remaining the same), the faces of the polygon would increase 
regularly in number. With each increase in the number of factors, 
the faces of the figure would become shorter, and the points on the 
frequency surface would move closer together. Finally, when the 
number of factors became very large—when m in the expression 
(p-Fq)" became infinite—the polygon would exhibit a perfectly 
smooth surface like that of the curve in Figure 20. This “ideal” 
polygon or “normal” curve represents the frequency of occurrence of 
various combinations of a very large number of equal, similar, and 
independent factors (e.g., coins), when the probability of the appear- 
a (e.g, Н) or non-appearance (e.g, T) of each factor is the 

Pop ке к four graphs plotted from measures of height, 

\ , у span, and barometric readings in Figure 19, 

with the normal probability curve in Figure 20, the similarity of 

ЕН diagrams to the normal curve is clearly evident. The resem- 

күздө m FoF po н to the normal seems to 

cal, bell-shaped form. This лө one * МЕ uS [еШ 
form of a “principle” as follows: a ey ee Тік 
phenomena and of many mental and social traits under certain con- 
ditions tend to be distributed symmetrically about their means in 


measurements of many natural | 
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proportions which approximate those of the normal probability 
distribution, 

Much evidence has accumulated to show that the normal distribu- 
tion serves to describe the frequency of occurrence of many variable 
facts with a relatively high degree of accuracy. Various phenomena 
which follow the normal probability curve (at least approximately) 
may be classified as follows: 


1. Biological statistics: the proportion of male to female births 
for the same country or community over a period of years; the pro- 
portion of different types of plants and animals in cross-fertilization 
(the Mendelian ratios). 

2. Anthropometrical data: height, weight, cephalic index, etc., for 
large groups of the same age and sex. 

8. Social and economic data: rates of birth, marriage, or death 
under certain constant conditions; wages and output of large num- 
bers of workers in the same occupation under comparable conditions. 

4. Psychological measurements: intelligence as measured by 
Standard tests; speed of association, perception-span, reaction-time ; 
educational test scores, e.g., in spelling, arithmetic, reading. 

5. Errors of observation: measures of height, speed of movement, 
linear magnitudes, physical and mental traits, and the like, contain 
errors which are as likely to cause them to deviate above as below 
their true values. Chance errors of this sort vary in magnitude and 
sign and occur in frequencies which follow closely the normal prob- 
ability curve.* 


It is an interesting speculation that many frequency distributions 
of scores and other measures are similar to those obtained by tossing 
Coins or throwing dice because the former, like the latter, are actually 
Probability distributions. The symmetrical normal distribution, as 
we have seen, represents the probability of occurrence of the various 
Possible combinations of a great many factors (e.g., coins). In a 
Normal distribution all of the n factors are taken to be similar, inde- 
pendent, and equal in strength; and the probability that each will be 
Present (e.g., show an H) or absent (e.g., show a T) is the same. 

he appearance on a coin of a head or a tail is undoubtedly deter- 
mined by a large number of small (or “chance”) influences as liable 
to work one way as another. The twist with which the coin is spun 
шау be important, as well as the height from which it is thrown, the 
Weight of the coin, the kind of surface upon which it falls, and many 
* This topic is treated in Chapter 8. 
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other circumstances of a like sort. By analogy, the presence or ab- 
sence of each one of the large number of genetic factors which deter- 
mine the shape of a man’s head, or his intelligence, or his personality, 
may depend upon a host of adventitious influences whose net effect 
we call “chance.” 

But the striking similarity of obtained and probability distribu- 
tions should not lead us to conclude that all distributions of mental 
and physical traits which exhibit the bell-shaped form have neces- 
sarily arisen through the operation of those principles which govern 
the appearance of dice or coin combinations. The factors which 
determine musical ability, let us say, or mechanical skill are too little 
known to justify the assumption, a priori, that they combine in the 
same proportions as do the head and tail combinations in “chance” 
distributions of coins. Moreover, the psychologist usually constructs 
his tests with the normal hypothesis definitely in mind. The result- 
ing symmetrical distribution is to be taken, then, as evidence of the 
success of his efforts rather than as conclusive proof of the “normal- 
ity” of the trait being measured.* 

The selection of the normal rather than some other type curve is 
sufficiently warranted by the fact that this distribution generally does 
fit the data better, and is more useful. But the “theoretical justifica- 


tion and the empirical use of the normal curve are two quite different 
matters.” + 


11. Properties of the Normal Probability Distribution 
1. The equation of the normal curve 


The equation of the normal probability curve reads 


22 


N lane 
oan " eR 


(equation of the normal probability curve) 
in which 


T = scores (expressed as deviations from th i 
the baseline or X-axis. Pio APER кө 


*McNemar, Q., Тһе Revisi, S 
Houghton Mifit бон 1949), Charter dr. the Stanford-Binet Scale (Boston: 


ў: scs D. C., A First Course in Statistics (London: G. Bell and Sons, 1921), 


| Ж 
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y = the height of the curve above the X-axis, i.e., the frequency of a 
given z-value or the number achieving a certain score. 


The other terms in the equation are constants:— 


N = number of cases. 
_ © = standard deviation of the distribution. 
7 = 3.1416 (the ratio of the circumference of a circle to its di- 
ameter). 
е = 2.7183 (base of the Napierian system of logarithms). 


When М and c are known, it is possible from equation (18) to 
compute (1) the frequency (or y) of a given value т, i.e., the number 
of individuals making a certain score; and (2) the number, or per- 
centage, of individuals scoring between two points, or above or below 
8 given point іп the distribution. But these calculations are rarely 
necessary, as tables are available from which this information may 
be readily obtained. A knowledge of these tables (Table A, p. 424) 
is extremely valuable in the solution of a number of problems. For 
this reason it is very desirable that the construction and use of 
Table A be clearly understood. 


2. Table of areas under the normal curve 


Table A gives the fractional parts of the total area under the 
normal curve found between the mean and ordinates (y's) erected at 
Various distances from the mean. In Table A distances along the 
Х-аліз are measured in c units (see Fig. 20). The total area under 
the curve (the number of scores in the distribution) is taken arbi- 
trarily to be 10,000, because of the greater ease with which fractional 
parts of the total area may then be calculated. 

The first column of the table, т/б, gives distances in tenths of c 
Measured off on the baseline of the normal curve from the mean as 
Origin. We have already learned that v = X — M, i.e., that т meas- 
ures the deviation of a score X from M. If x is divided by o, deviation 
from the mean is expressed in o-units. Such o-deviation scores are 
often called standard scores, or z-scores (z — ж/б). Distances from 
the mean in hundredths of c are given by the headings of the columns. 
To find the number of cases in a normal distribution between the 
Mean and the ordinate erected at a distance of 16 from the mean, go 
down the z/c column until 1.0 is reached, and in the next column 
Under .00 take the entry opposite 1.0, viz., 3413. This figure means 
that 3413 cases in 10,000, or 34.13% of the entire area of the curve, lie 


ст. 
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between the mean and 16. Put more exactly, 34.13% of the cases in 
a normal distribution fall within the area bounded by the baseline of 
the curve, the ordinate erected at the mean, the ordinate erected at 
a distance of 16 from the mean, and the curve itself (see Fig. 20, 
p. 87). To find the percentage of the distribution between the mean 
and 1.576, say, go down the z/c column to 1.5, then across horizon- 
tally to the column headed .07, and take the entry 4418. This means 
that in a normal distribution, 44.18% of the area (N) lie between 
the mean and 1.576. 

We have so far considered only o-distances measured in the posi- 
tive direction from the mean; that is, we have taken account only of 
the right half—the high-score end—of the normal curve. Since the 
curve is bilaterally symmetrical, the entries in Table A apply to 
o-distances measured in the negative direction (to the left) as well 
as to those measured in the positive direction. To find the percentage 
of the distribution between the mean and —1.266, for instance, take 
the entry in the column headed 06, opposite 1.2 in the т/б column. 
This entry (3962) tells us that 39.62% of the cases in the normal 
distribution fall between the mean and —1.260. The percentage of 
cases between the mean and —16 is 34.13; and the student will now 
be able to verify the statement made on page 52 that between the 
mean and +1o are 68.26% of the cases in a normal distribution 
(see also Fig. 20). 

While the normal curve does not actually meet the baseline until 
We are at infinite distances to the right and left of the mean, for 
practical purposes the curve may be taken to end at points —Зо and 
+30 distant from the mean. Table A shows that 4986.5 cases in the 
total 10,000 fall between the mean and -H3o; and 4986.5 cases will, 
of course, fall between the mean and —3c. Therefore, 9973 cases in 
10,000, or 99.73% of the entire distribution, lie within the limits —3c 
and +30. By cutting off the curve at these two points, therefore, we 


disregard only .27 of 1% of the distribution, a negligible amount 
except in very large samples, 


3. Relationships among the constants of the normal probability curve 


In the normal probability curve, th i 
mode all fall exactly at th 2142. ee Vie 
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Тһе measures of variability include certain constant fractions of 
the total area of the normal curve, which may be read from Table A. 
Between the mean and +10 lie the middle two-thirds (approxi- 
mately) of the cases in the normal distribution. Between the mean 
апа +20 are found 95% (approximately) of the distribution; and 
between the mean and +30 are found 99.7% (approximately 10096) 
of the distribution. There are 68 chances (approximately) in 100 
that a score will lie within 10 from the mean in the normal distribu- 
tion; there are 95 chances іп 100 that it will lie within 2-26 from the 
mean; and 99.7 chances in 100 that it will lie within +30 from the 
mean. 

Instead of c the Q may be used as the unit of measurement in 
determining areas within given parts of the normal curve. In the 
normal eurve the Q (p. 46) is generally called the probable error or 
PE. The relationships between PE and с are given in the following 
equations: 

РЕ = 67450 
с = 1.4826 PE 


from which it is seen that с is always about 50% larger than the 
PE (р. 52). 

By interpolation in Table A we find that +.6745o or +1 PE in- 
clude the 25% just above and the 25% just below the mean. This 
Part of the normal curve, sometimes called the “middle 50,” is impor- 
tant because it is often taken to define the range of “normal” per- 
formance. The upper 25% is considerably better, and the lowest 
25% considerably poorer in performance than the typical middle or 
Average group. From Table А we find also that +2 PE (or +1.34900) 
from the mean include 82.26% of the measures in the normal curve; 
that +3 PE (or +2.0235c) include 95.70%; and that +4 PE (or 


2.69800) include 99.30%. 


III. Measuring Divergence from Normality 


1. Skewness 


In a frequency polygon or histogram, usually the first thing which 
Strikes the eye is the degree of symmetry in the figure. In the normal 
curve the mean, the median, and the mode all coincide and there is 
Perfect balance between the right and left halves of the figure. A 
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distribution is said to be “skewed” when the mean, the median, and 
the mode fall at different points in the distribution, and the balance 
(or center of gravity) is shifted to one side or the other, to right or 
left. It is important to know (1) whether the skewness which often 
occurs in distributions of test scores and other measures represents a 
real divergence from the normal form; or (2) whether such diver- 
gence is the result of chance fluctuations, arising from temporary 
causes, and is not significant of real discrepancy. The degree of dis- 
placement or skewness in a frequency distribution may be deter- 
mined by the formula 


Sh = 3 (mean — median) 
с 


(а measure of skewness in а frequency distribution) 


(19) 


In a normal distribution the mean equals the median and the skew- 
ness is 0. The more nearly the distribution approaches the normal 
form, the closer together are the mean and the median, and the less 
the skewness. Distributions are said to be skewed negatively, or to 
the left, when the scores are massed at the high end of the scale (the 
right end), and spread out gradually at the low or left end, as shown 


Mean ‘Median 
FIG. 23 Negative skewness: to the left 


Median Mean 


FIG. 24 Positive skewness: to the right 


— 
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in Figure 23. Distributions are skewed positively, or to the right, 
when the scores are massed at the low (the left) end of the scale, and 
Spread out gradually toward the high or right end as shown in 
Figure 24. 

If we apply formula (19) to the distribution of 50 Army Alpha 
Scores in Table 1, page 5, —.28 is obtained as a measure of skew- 
ness. This result points to a slight negative skewness in the data, 
which may be seen by reference to Figure 2, page 11. Formula (19) 
gives the measure of skewness for the distribution of the 200 cancel- 
lation scores (Table 3, page 13) as .009. This negligible degree of 
positive skewness shows how closely this distribution approaches the 
Symmetrical probability form. 

Another measure of skewness is given by the formula 


вк = Pet РӘ) р, on 


(a measure of skewness in terms of percentiles) * 


\ For the normal distribution Sk by formula (20) is zero: Р;о lies 
Just midway between Poo and Pio. 

Applying this formula to the distributions of 50 Army Alpha scores 
and 200 cancellation scores, we obtain for the first Sk = —2.50; and 
for the second Sk = .03. These results are numerically different from 
the measures of skewness obtained from formula (19), because the 
two measures of skewness are computed from different reference 
values in the distribution, and hence are not directly comparable. 
The two formulas agree, however, in indicating some negative skew- 
Dess for the distribution of 50 Alpha scores, and an insignificant 
degree of positive skewness for the 200 cancellation scores. In com- 
Paring the skewness of two distributions we should use either for- 
Mula (19) or (20) ; not first the one and then the other. 

he important question of how much skewness a distribution must 
exhibit before it may be said to be significantly skewed cannot be 
answered until we have calculated a “standard error" of our measure 
;9f skewness, A formula for the standard error of Sk, when deter- 
Mined by formula (20), and a method of testing whether the skew- 
Ness of a given distribution is significant are discussed in Chapter 9, 

Page 241, а? 
* Kelley, T. L., Statistical Method (New York: Macmillan, 1923), p. 77, The 


terms in this formula, as given by Kelley, have been reversed so that the sign of 
will agree with the conventional notion of positive and negative skewness. 
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2. Kurtosis 


The term kurtosis refers to the “peakedness” or flatness of a fre- 
quency distribution as compared with the normal. A frequency dis- 
tribution more peaked than the normal is said to be leptokurtic; one 
flatter than the normal, platykurtic. Figure 25 shows a leptokurtic 


A 
30 -2с -lg 0 +10 +20 +30 
FIG. 25 Leptokurtic (A), normal or mesokurtic (В), and platykurtic 
(C) curves 


distribution and a platykurtic distribution plotted on the same dia- 
gram around the same mean. A normal curve (called mesokurtic) 
has also been drawn іп on the diagram to bring out the contrast in 
the figures, and to make comparison easier. A formula for measuring 
kurtosis is : 


Q 
es" 
LESS at) 


(a measure of kurtosis in terms of percentiles) 


For the normal curve, formula (21) gives Ku = .263.* If Fads 
greater than .263 the distribution is platykurtie; if less than .263 the 
distribution is leptokurtic. Calculating the kurtosis of the віне 
tions of fifty Alpha scores and 200 cancellation scores, discussed 
above, we obtain Ки = .237 for the first distribution, atid Ки = 225 

ж D = = 
in up ^ $3 210 90a, Bas 1.280, and Ри = -1280. Hence by 
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for the second. Both distributions, therefore, are slightly leptokurtic. 
To determine whether the kurtosis in a distribution is significant, 
that is, whether the curve is too high or too flat to be treated as sensi- 
bly normal, we must evaluate Ки in terms of its standard error. А 
formula for the standard error of Ku, and a method of determining 
the significance of an obtained measure of Ku will be given in Chap- 
ter 9, page 242. 


3. Comparing a given histogram or frequency polygon with a normal 
curve of the same area, М and c 


In this section methods will be described for superimposing on a 
given histogram or frequency polygon a normal curve of the same N, 
M, and c as the actual distribution. Such a normal curve is the "best 
fitting" normal distribution for the given data. The research worker 
often wishes to compare his distribution “by eye" with that normal 
curve which “best fits" the data, and such a comparison may profita- 
bly be made even if no measures of divergence from normality are 
computed. In fact, the direction and extent of asymmetry often 
strike us more convincingly when seen in a graph than when ex- 
Pressed by measures of skewness and kurtosis. It may be noted that 
а normal curve can always be readily constructed by following the 
Procedures given here, provided the area (N) and variability (о) 
are known, 


TABLE 16 Frequency distribution of the scores made by 206 freshmen 
on the Thorndike Intelligence Examination 


= 81.59 
= 81.00 
= 12.14 


Mean 
Median 
с 
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Table 16 shows the frequency distribution of scores made on the 
Thorndike Intelligence Examination by 206 college freshmen. The 
mean is 81.59, the median 81.00, and the o 12.14. This frequency 
distribution has been plotted in Figure 26, and over it, on the same 
axes has been drawn in the best-fitting normal curve, i.e., the normal 
curve which best describes these data. The Thorndike scores are rep- 
resented by a histogram instead of by a frequency polygon in order 
to prevent coincidence of the surface outlines and to bring out more 
clearly agreement and disagreement at different points. To plot a 
normal curve over this histogram, we first compute the height of the 
maximum ordinate (у) or the frequency at the middle of the dis- 
tribution. The maximum ordinate (yo) сап be determined from the 
equation of the normal curve given on page 94. When тіп this equa- 
tion is put equal to zero (the z at the mean of the normal curve is 0), 


the term азат equals 1.00, and y, — UE In the present problem, 


N = 206; o = 243 * (іп units of class-interval), and 3/27 = 2.51; 
hence y, = 33.8 (see Fig. 26 for calculations). Knowing yo, we are 


BEE BBRERESSESS 


a 
оњ awe 


FIG. 26 Frequency distribution of the scores of 206 freshmen on the 
Thorndike Intelligence Examination, compared with best-fitting 
normal curve for same data 
(For data, see Table 16.) 


* с = 243 X 5 (interval). Тһе с in interval units i i i 
since the units on the X-azis are in terms of sakes inue eque et 
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Normat Curve ORDINATES AT Mean, clo; +20, +30 


= So. m 
0 = Урт 243X201 
+ le = .60653 X 33.8 = 20.5 
+ 27 = 13534 X 338 = 46 
+ 3с = 101111 X 338 = 4 


= 33.8 


able to compute from Table В the heights of ordinates at given dis- 
tances from the mean. The entries in Table B give the heights of the 
ordinates in the normal probability curve, at various o-distances 
from the mean, expressed as fractions of the maximum or middle 
ordinate taken equal to 1.00000. To find, for example, the height of 
the ordinate at +10, we take the entry .60653 from the table opposite 
2/0 = 1.0. This means that when the maximum central ordinate 
(Yo) is 1.00000, the ordinate (ie., frequency) --16 removed from 
M is -60653; or the frequency at --1o is about 61% of the maximum 
frequency at the middle of the distribution. In Figure 26 the ordi- 
nates --16 from M are .60653 X 33.8 (уо) or 20.5 The ordinates 
2—26 from М are .13534 X 33.8 or 4.6; and the ordinates +30 from 
М are 01111 X 33.8 or 4. 

The normal curve may be sketched in without much difficulty 
through the ordinates at these seven points. Somewhat greater accu- 
Tacy may be obtained if various intermediate ordinates, for exam- 
Ple, at +.50, +£1.50, etc., are also plotted. The ordinates for the 
Curve in Figure 26 at --.5o are .88250 X 33.8 or 29.3; at +1.50, 

92465 x 33.8 or 11.0, ete. 


LEE \. From formula (20) the skewness of our distribution of 206 scores 


18 found to be 1.25. This small value indicates a low degree of posi- 
tive skewness in the data. The kurtosis of the distribution by for- 
mula (21) is 244, and the distribution appears to be slightly lepto- 

urtie (this is shown by the “peak” rising above the normal curve). 

either measure of divergence, however, is significant of a real" 
discrepancy between our data and those of the normal distribution 
Sce p, 212). On the whole, then, the normal curve plotted in Fig- 
Ure 26 fits the obtained distribution well enough to warrant our treat- 
ing these data as sensibly normal. 


IV. Applications of the Normal Probability Curve 


This section will consider a number of problems which may readily 
е solved if one can assume that the distributions of scores may be 
treated as normal, or at least as approximately normal. Each general 
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problem will be illustrated by several examples. These examples are 


intended to present the issues concretely, and should be carefully ^ 


worked through by the student. Constant reference will be made to 
Table A; and a knowledge of how to use this table is essential. 


1. To determine the percentage of cases in a normal distribution which 
fall within given limits 


Example (1) Given a normal distribution with a mean of 12, and 
ao of 4. (a) What percentage of the cases fall between 8 and 16? 
(b) What percentage of the cases lie above 18? (c) Below 6? 


(а) А score of 16 * is four points above the mean, and a score of 8 
is four points below the mean. If we divide this scale distance of four 
score units by the o of the distribution (i.e., by 4) it is clear that 16 
is 1c above the mean, and that 8 is 16 below the mean (see Fig. 27, 
below). There are 68.26% of the cases in a normal distribution 
between the mean and +1o (Table A). Hence, 68.26% of the scores 


55 8 12 16 18.5 
Mean 


FIG. 27 


in this distribution, or approximately the middle two-thirds, fall 
between 8 and 16. This result may also be stated in terms of 
“chances.” Since 68.26% of the cases in the given distribution fall 
between 8 and 16, the chances are about 68 in 100 that any score in 
the distribution will be found between these points. 
(b) The upper limit of a score of 18, namely, 18.5, is 6.5 score units 
* A score of 16 is the midpoint of the interval 15.5 to 16.5 


Е 
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or 1.6256 above the mean (6.5/4 — 1.625). From Table A we find 
that 44.79% of the cases in the entire distribution fall between the 
mean and 1.6256. Accordingly, 5.2196 of the cases (50.00 — 44.79) 
must lie above the upper limit of 18 (viz., 18.5) in order to fill out 
the 50% of cases in the upper half of the normal curve (Fig. 27). In 
terms of chances, there are about 5 chances in 100 that any score in 
the distribution will be larger than 18. $ 
(c) The lower limit of a score of 6, namely 5.5, is —1.625c from the 
mean, Between the mean and 5.5 (—1.625c) are 44.79% of the cases 
in the whole distribution. Hence, about 5% of the cases in the dis- 
У tribution lie below 5.5—fill out the 50% below the mean—and the 
chances are about 5 in 100 that any score in the distribution will be 
less than 6, i.e., below the lower limit of score 6. 

Example (2) Given a normal distribution with a mean of 29.75 
and a o of 6.75. What percentage of the distribution will lie be- 
tween 22 and 26? What are the chances that a score will be be- 
tween these two points? 

A score of 22 * is 7.75 score units or -1.156 (7.75/6.75 = 1.15) 
from the mean; and a score of 26 is 3.75 or —.56бо from the mean 
227 


-За -2g /-la/ 2975 lo 20 Зо 
22 267 Mean 


FIG. 28 


(Fig. 28, above). We know from Table A that 37.49% of the cases 

м Ша normal distribution lie between the mean and —1.150; and that 

` 212396 of the cases lie between the mean and —.560. By simple sub- 

traction, therefore, 16.26% of the cases fall between —1.150 and 
*A score of 22 is the midpoint of the interval 21.5 — 22.5. 
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--.56с or between the scores 22 and 26. The chances are 16 іп 100 
that any score in the distribution will lie between these two points. 


2. To find the limits in any normal distribution which will include a given 
percentage of the cases 


Example (1) Given a normal distribution with a mean of 16.00 
and a c of 4.00. What limits will include the middle 75% of the 
cases? 


The middle 75% of the cases in a normal distribution must include 
the 37.5% just above, and the 37.5% just below the mean. From 
Table A we find that 3749 cases in 10,000, or 37.5% of the distribu- 
tion, fall between the mean and 1.150; and, of course, 37.5% of the 
distribution also fall between the mean and —1.15c. The middle 
75% of the cases, therefore, lie between the mean and 1.156; or, 
since в = 4.00, between the mean and +4.60 score units. Adding 
+£4.60 to the mean (to 16.00), we find that the middle 75% of the 
scores in the given distribution lie between 20.60 and 11.40 (see Fig. 
29, below). 


140 16.00 
9 4.00 


FIG. 29 


20.60 


Example (2) Given a normal distribution with a medi 
edian of 
150.00 and a PE (Q) of 17. What limits will include the highest 
20% of the distribution? the lowest 10% ? 


We know from page 97 that о = 1.4826 РЕ; hence the с of this 
distribution is 25.20 (1.4826 X 17). The highest 20% of a normally 


p: 
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distributed group will have 30% of the cases between its lower limit 
and the median, since 50% of the cases lie in the right half of the 
distribution. From Table A we know that 2995 cases in 10,000 or 
80% of the distribution are between the median and .84c. Since the с 
of the given distribution is 25.20, .84c will be .84 X 25.20 or 21.17 
Score units above the median, or at 171.17. Тһе lower limit of the 
highest 2096 of the given group, therefore, is 171.17; and the 
Upper limit is the highest score in the distribution, whatever that 
may be. 

The lowest 10% of a normally distributed group will have 40% of 
the cases between the median and its upper limit. Almost exactly 
40% of the distribution fall between the median and -1.280. Henee, 
Since о = 25.20, —1.280 must lie at —1.28 X 25.20 or 32.26 score 
units below the median, that is, at 117.74. The upper limit of the 
lowest 1076 of scores in the group, accordingly, is 117.74; and the 
lower limit is the lowest score in the distribution. 


3. To compare two distributions in terms of "overlapping" 


Example (1) Given the distributions of the scores made on a 
logical memory test by 300 boys and 250 girls (Table 17). The 
boys' mean score is 21.49 with a o of 3.63. The girls’ mean score 
is 23.68 with a c of 5.12. The medians are: boys, 21.41, and girls, 
23.66. What percentage of boys exceed the median of the girls’ 
distribution? 

On the assumption that these distributions are sensibly normal, we 
Шау solve this problem by means of Table A. The girls’ median is 
23.66 — 21.49 or 2.17 score units above the boys’ mean. Dividing 
2.17 by 3.63 (the o of the boys’ distribution), we find that the girls’ 
Median is 60 above the mean of the boys’ distribution. Table A 
Shows that 23% of a normal distribution lie between the mean and 
‘606; hence 27% of the boys (50% — 23%) exceed the girls’ 
Median, 

3 This problem may also be solved by direct calculation from the 
distributions of boys’ and girls’ scores without any assumption as to 
Dormality of distribution. The caleulations are shown in Table 17; 
and it will be interesting to compare the result found by direct calcu- 
ation with that obtained by use of the probability tables. The prob- 
em is to find the number of boys whose scores exceed 23.66, the girls’ 
adian, antl then then this аре О a percentage. There are 217 

9$ who score up to 23.5 (lower limit of 23.5 to 27.5). Тһе class- 
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interval 23.5 to 27.5 contains 68 scores; hence there are 68/4 or 17 
scores per scale unit on this interval. We wish to reach 23.66 in the 
boys' distribution. This point is .16 of a score (23.66 — 23.50 — 16) 
above 23.5, ог 2.72 (і.е., 17 X .16) score units above 23.5. Adding 
2.72 to 217, we find that 219.72 of the boys’ scores fall below 23.66, 
the girls’ median. Since 300 — 219.72 = 80.28, it is clear that 


TABLE 17 To illustrate the method of determining overlapping by direct 
calculation from the distribution 


Boys Girls 
Scores y Scores 7 
27.5 to 31.5 15 31.5 to 35.5 20 
23.5 to 27.5 68 27.5 іо 81.5 85 
19.5 to 23.5 128 23.5 to 27.5 73 
15.5 to 19.5 79 19.5 to 23.5 68 
11.5 to 15.5 10 15.5 to 19.5 41 
М - 800 11.5 to 15.5 13 
N/2 = 150 N = 250 
N/2 = 125 
Мап = 19.5 + is X 4 Mdn = 23.5 + 45 X 4 
= 21.41 = 23.66 
M = 21.49 M = 23.68 
с = 3.63 gc = 5.12 


What percent of the boys exceed 23.66, the median of the girls? First, 
217 boys make scores below 23.5. Тһе class-interval 23.5-27.5 contains 68 
scores; hence, there are 68/4 or 17 scores per scale unit on this interval. 

The girls’ median, 23.66, is .16 above 23.5, lower limit of interval 23.5— 
27.5. If we multiply 17 (number of scores per scale unit) by .16 we obtain 
2.72 which is the distance we must go into interval 23.5-27.5 to reach 23.66. 

Adding 217 and 2.72, we obtain 219.72 as that part of the boys’ distri- 
bution SAEN falls below the point 23.66 (girls’ median). N is 300; hence: 
300-219.72 gives 80.28 as that part of the boys' distribution which lies 
above 23.66. Dividing 80.28 by 300, we find that .2676, or approximately 
27%, of the boys exceed the girls’ median. 


80.28 — 300 or 26.76% (approximately 27%) of the boys exceed the 
girls’ median. This result is in almost perfect agreement with that 
obtained above. Apparently the assumption of normality of distri- 
bution for the boys’ scores was justified. 

The agreement between the percentage of overlapping found by 
direct calculation from the distribution and that found by use of the 
probability tables will nearly always be close, especially if the groups 
are large and the distributions fairly symmetrical. When the over- 
lapping distributions are small and not very regular in outline, it 18 
safer to use the method of direct calculation, since no ар оа as 
to form of distribution is then made. 
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4. To determine the relative difficulty of test questions, problems, and 
other test items 


Example (1) Given a test question or problem solved by 10% of 
a large unselected group; a second problem solved by 20% of the 
same group; and a third problem solved by 30%. If we assume the 
capacity measured by the test problems to be distributed normally, 
what is the relative difficulty of questions 1, 2, and 3? 


Our first task is to find for Question 1 a position in the distribution, 
Such that 10% of the entire group (the percent passing) lie above, 
апа 90% (the percent failing) lie below the given point. The highest 
10% in a normally distributed group has 40% of the cases between 


х 


its lower limit and the mean (sce Fig. 30, above). From Table A we 
find that 39.97% (i.e., 40%) of a normal distribution fall between 

he mean and 1.280. Hence, Question 1 belongs at a point on the 
baseline of the curve, a distance of 1.286 from the mean; and, accord- 
mely, 1 285 may be set down as the difficulty value of this question. 

Question 2, passed by 20% of the group, falls at a point in the dis- 
tribution 30% above the mean. From Table A it is found that 
29.95% (ie. 30%) of the group fall between the mean and .84o; 
hence, Question 2 has a difficulty value of .846. Question 3, which 
lies ata point in the distribution 20% above the mean, has a difficulty 
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value of .526, since 19.85% of the distribution fall between the mean 
and .526. To summarize our results: 


Question Passed by o-value o-difference 
1 10% 1.28 — 
2 90% 84 E 
8 3095 52 - 32 


The o-difference in difficulty between Questions 2 and 3 is .32, which 
is roughly 3/4 of the c-difference in difficulty between Questions 1 
and 2. Since the percentage difference is the same in the two compari- 
sons, it is evident that when ability is assumed to follow the normal 


distribution, с and not percentage differences are the better indices 
of differences in difficulty. 


Example (2) Given three test items, 1, 2, and 3, passed by 50%, 
40%, and 30%, respectively, of a large group. On the assumption 
of normality of distribution, what percentage of this group must 
pass test item 4, in order for it to be as much more difficult than 
8, as 2 is more difficult than 1? ; 


An item passed by 50% of a group is, of course, failed by 50%; 
and, accordingly, such an item falls exactly in the middle of a normal 
distribution of “difficulty.” Test item 1, therefore, has a o-value of 
100, since it falls exactly at the mean (Fig. 31). Test item 2 lies at a 


250 520 1172 
FIG. 31 


THE NORMAL PROBABILITY CURVE = 111 


Point in the distribution 10% above the mean, since 40% of the 
&roup passed and 60% failed this item. Accordingly, the o-value of 
item 2 is .25, since from Table А we find that 9.87% (roughly 10%) 
of the cases lie between the mean and .25. Test item 3, passed by 
30% of the group, lies at a point 20% above the mean, and this item 
has a difficulty value of .526, as 19.85% (20%) of the normal distri- 
bution fall between the mean and .520. 

Since item 2 is .256 farther along on the difficulty scale (toward 
the high-score end of the curve) than item 1, it is clear that item 4 
must be 256 above item 3, if it is to be as much harder than item 3 
as item 2 is harder than item 1. Item 4, therefore, must have a value 
of 526 -+ 250 or 776; and from Table A we find that 27.94% (28%) 

. Of the distribution fall between the mean and this point. This means 
that 50% — 28% or 22% of the group must pass item 4. To sum- 
marize: 


Test Item Passed by o-value o-difference 
1 50% 00] — 
2 40% 25 25 
8 80% 52 — 
4 22% ИТ, 25 


À test, item, therefore, must be passed by 22% of the group in order 
-Or it to be as much more difficult than an item passed by 30%, as an 
item passed by 40% is more difficult than one passed by 50%. Note 
again that percentage differences are not reliable indices of differ- 
ences in difficulty when the capacity measured is distributed 
Normally, 


5. To Separate a given group into sub-groups according to capacity, 
When the trait is normally distributed 


Example (1) Suppose that we have administered a certain ex- 
amination to 100 college students. We wish to classify our group 
into five sub-groups A, B, C, D, and E according to ability, the 
"ange of ability to be equal in each sub-group. On the assumption 
that the trait measured by our examination is normally dis- 
tributed, how many students should be placed in groups А, В, С, 
D, and Е? 


Let Us first represent the positions of the five sub-groups. diagram- 
"atically on a normal curve as shown in Figure 32, below. If the 
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baseline of the curve is considered to extend from —3c to +36, that 
is, over a range of 66, dividing this range by 5 (the number of sub- 
groups) gives 1.26 as the baseline extent to be allotted to each group. 
These five intervals may be laid off on the baseline as shown in the 
figure, and perpendiculars erected to demarcate the various sub- 
groups. Group A covers the upper 1.20; group B the next 1.26; 
group C lies .66 to the right and .66 to the left of the mean; groups 
D and E occupy the same relative positions in the lower half of the 
curve that B and А occupy in the upper half. 


FIG. 32 


'To find what percentage of the whole group belongs in A we must 
find what percentage of a normal distribution lies between Зе (upper 
limit of the A group) and 1.80 (lower limit of the A group). From 
Table A 49.86% of a normal distribution is found to lie between the 
mean and Зо; and 46.41% between the mean and 1.86. Hence, 3.596 
of the total area under the normal curve (49.86% — 46.41%) lie 
between 3c and 1.86; and, accordingly, group А comprises 3.596 of 
the whole group. 

The percentages in the other groups are caleulated in the same 
way. Thus, 46.41% of the normal distribution fall between the mean 
and 1.86 (upper limit of group B) and 22.57% fall between the 
mean and .6e (lower limit of group B). Subtracting, we find that 
46.41% — 22.57% or 23.84% of our distribution belong in sub- 
group B. Group C lies from .66 above to —.6g below the mean. 
Between the mean and .66 are 22.5796 of the normal distribution, and 


М 
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the same percent lies between the mean and —.6c. Group С, there- 
fore, includes 45.14% (22.57 X 2) of the distribution. Finally, sub- 
group D, which lies between —.6c and —1.86, contains exactly the 
same percentage of the distribution as sub-group В; and group Е, 
which lies between —1.8с and -3о, contains the same percent of the 
whole distribution as group A. The percentage and number of men 
in each group are given in the following table: 


Groups 
A B С р Е 
Percent of total in each group 3.5 238 45 238 3.5 
Nuniber in each group 4or3 24 45 24 4or3 


(100 men in all) : 
On the assumption that the capacity measured follows the normal 
curve, it is clear that three to four men in our group of 100 should be 
Placed in group A, the “marked” ability group; twenty-four in 
group B, the “high average" ability group; forty-five in group С, 
the "average" ability group; twenty-four in group D, the "low aver- 
` аре” ability group; and three or four in group E, the “very low" or 
"inferior" group. 
. The above procedure may be used to determine how many students 
in a class should be assigned to each of any given number of grade- 
groups. It must be remembered that the assumption is made that 
Performance in the subject matter upon which the individuals are 
being marked is represented by the normal curve. The larger and 
More unselected the group the more nearly is this assumption 
Justified, 


V. Why Frequency Distributions Deviate from the 
~ Normal Form 


arch worker to know why his dis- 
tributions diverge from the normal form, and this is especially true 
When the deviation from normality is large and significant (p. 212). 

€ reasons why distributions exhibit skewness and kurtosis are 
Numerous and often complex, but a careful analysis of the data will 
often permit the setting up of hypotheses concerning non-normality 

ich may be tested experimentally. Common causes of asymmetry, 
all of Which must be taken into consideration by the careful experi- 
menter, will be summarized in the present section. 


It is often important for the rese 
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1. Unrepresentative or biased sampling 


Selection is a potent cause of skewness. We should hardly expect 
the distribution of I.Q.’s obtained from a group of twenty-five ten- 
year-old boys (all superior students) to be normal; nor would we 
look for symmetry in the distribution of I.Q.’s got from a special 
class of dull-normal ten-year-old boys, even though the group were 
fairly large. Neither of these groups is an unbiased selection (1.6., а 
cross-section) from the population of ten-year-old boys; and in addi- 
tion, the first group is quite small. A small sample is not necessarily 
unrepresentative, but more often than not it is apt to be. 

Selection will produce skewness and kurtosis in distributions even 
when the test has been adequately constructed and carefully admin- 
istered. For example, a group of elementary school pupils which con- 
tains (a) a large proportion of bilinguals, (b) many children of very 
low or very high socio-economic status, (c) a large number of pupils 
over-age for grade or accelerated, will almost surely return skewed 
distributions of test scores even upon standard intelligence and edu- 
cational achievement examinations. 1 

Scores made by small and homogeneous groups are likely to yield 
leptokurtic distributions; while scores from large and heterogeneous 
groups are more likely to be platykurtic. The distribution of scores 
achieved upon an educational examination by pupils throughout the 
elementary grades, as well as the distribution of chronological ages 
for these same pupils, will probably be somewhat flattened owing to 
the considerable overlap from grade to grade. 

Distributions of physical traits, such as height, weight, and 
strength, are also affected by selection. Measurements of physical 
traits in large groups of the same age, sex, and race will closely 
approximate the normal form (p. 85). But the distribution of height 
for fourteen-year-old girls in the high school of. a small city, or the 
distribution of weight for freshmen in a midwestern college, will prob- 
ably be skewed, as these groups are subject to selection in various 
a traits related to height and weight. ) 


2. Use of unsuitable or poorly made tests ~ 


р : 08 
Tf a test is too easy, scores will pile up at the high-score end of the 
distribution, while if the test is too hard scores will pile up at the low- 


Score end. Imagine, for example, that an examination in arithmetic - 


-2 


* 
cT fect Scores and little discrimination; in the second case a number of 
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which requires only addition, subtraction, multiplication, and divi- 
sion, has been given to 1000 seventh graders. The resulting distribu- 
tion will almost certainly be badly skewed to the left (see Fig. 23). 
On the other hand, if the examination contains only problems in 
complex fractions, interest, square root, and the like, the score dis- 
tribution is likely to be positively skewed—low scores will be more 
numerous than intermediate or high scores. It is probable also that 
both distributions will be somewhat more “peaked” (leptokurtic) 
than the normal. 
Asymmetry in cases like these may be explained in terms of those 
~ Small positive and negative factors which determine the normal dis- 
tribution, Too easy a test excludes from operation some of the factors 
which would make for an extension of the curve at the upper end, 
Such as knowledge of more advanced arithmetical processes which 
the brighter child would know. Too hard a test excludes from opera- 
tion factors which make for the extension of the distribution at the 
low end, such as knowledge of those very simple facts which would 
have permitted the answering of a few at least of the easier questions 
lad these been included. In the first case we have a number of per- 


Zero Scores and equally poor differentiation. Besides the matter of 
difficulty in the test, asymmetry may be brought about by ambigu- 
Ous or poorly made items and by other technical faults.* 


3. The measurement of traits the distributions of which are not normal 


Skewness or kurtosis or both may also appear owing to a real lack 
of Dormality in the trait being measured. Non-normality of dis- 
“bution will arise, for instance, when some of the hypothetical fac- 
Ors determining performance in a trait are dominant or prepotent 
Ver the others, and hence are present more often than chance will 
allow, Illustrations may be found in distributions resulting from the 
throwing of loaded dice. When off-center or biased dice are cast the 


* 3 
Hawkes Li А Mann, The Construction and Use of Achievement 

Taminations [iei ine Mifflin Co., 1936), Chapters II and III. 

h There is no reason why all-distributions should approach the normal form, 
ашке has written: “There is nothing arbitrary or mysterious about vari- 
an Чу which makes the so-called normal type of distribution a necessity, or 
а 2 More rational than any other sort, or even more to be expected on 
Тайғақ grounds. Nature does not abhor irregular distributions.”—Theory of 
Se Ша! and Social Measurement (New York: Teachers College, 1913), рр. 
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resulting distribution will certainly be skewed and probably peaked, | 
owing to the greater likelihood of combinations of faces yielding 
extreme scores. The same is true of biased coins. Suppose, for 
example, that the probability of “success” (appearance of H) is four | 
times the probability of failure (non-occurrence of H, or presence of 
T), so that p = 4/5, q = 1/5, and (p 4- q) = 1.00. If we think of the 
factors making for success or failure as 3 in number, we may expand 
(p--q)? to find the incidence of success and failure in varying 
degree. Thus, (p + q)? = p? 4- 3p?q + 3pq? 4- 4°, and substituting 
р = 4/5 and д = 1/5, we have 
(1 P= (4/5)? - i (2) Expressed as a frequency 
distribution: 
: 48 ; 
Зр? = 3(4/8)- (1/8) = 155 “Successes” f 
Зрр = 3(4/5)-(1/5)? = © E 
ра = р )? = 125 


i als 
. 125 


m 
to 


2 

1 

g= Q/5 0 adi 
N = 125 

The numerators of the probability ratios (frequency of success) may w r 

be plotted in the form of a histogram to give Figure 33. 


1 
9 1 2 3 
Successes 


FIG. 33 Histogram of the expan- FIG. 34 U-shaped frequency 
sion (p+ gf, where p = $, 4-3. curve ^ 
p is the probability of success, q 

the probability of failure 


Note that this distribution is negatively skewed (to the left) ; that 
the incidence of three “successes” is 64, of two 48, of one 12 aad of 
none 1. J-shaped distributions like these are essentially А ПОГА 
Such curves have been most often found by psychologists to describe 
certain forms of social behavior. For example, suppose that we tabu- 
late the number of students who appear at а lecture “on time"; and | 


AN 
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the number who come in five, ten, and fifteen-plus minutes late. If 
frequency of arrival is plotted against time, the distribution will be 
highest at zero (“оп time”) on the Y-azis and will fall off rapidly as 
We go to the right, i.e., will be positively skewed and J-shaped (see 
Fig. 24). If only the early-comers are tallied, up to the “on time” 
group, the curve will be negatively skewed like those in Figures 23 
and 33. J-curves describe behavior which is essentially non-normal 
in occurrence because the causes of the behavior differ greatly in 
Strength. But J-curves may also represent frequency distributions 
badly skewed for other reasons, We have seen in (1) and (2) above 
that selection and poorly chosen tests can produce distributions 
which closely resemble J-curves. 

Skewed curves often occur in medical statistics. The frequency of 
death due to degenerative disease, for instance, is highest during 
maturity and old age and minimal during the early years. If age is 
laid off on the baseline and frequency of death plotted on the Y-axis 
the curve will be negatively skewed and will resemble Figure 23 
closely. Factors making for death are prepotent over those making 
for survival as age increases, and hence the curve is essentially asym- 
metrical. In the case of a childhood disease, the occurrence of death 
will be positively skewed when plotted against age as the probability 
of death becomes less with increase in age. 

Another type of non-normal distribution, which may be briefly 
described, is the U-shaped curve shown in Figure 34. U-shaped dis- 
tributions, like J-curves, are rarely encountered in mental and physi- 
cal measurement. They are sometimes found in the measurement of 
Social and personality traits, if the group is extremely heterogeneous 
With respect to some attribute, or if the test measures a trait that is 

ikely to be present or absent in an all-or-none manner. Thus, in a 
Broup composed about equally of normals and mentally ill persons, 
lé normals will tend to make low scores on a Neurotic Inventory 
While the abnormals will tend to make high scores—with considera- 
le Overlapping, of course. Again, in tests of suggestibility, if a 
Subject, yields to suggestion in the first trial he is likely to be sug- 
8estible in all trials—thus earning a high score. On the other hand, 
e resists suggestion on the first trial, he is likely to resist in all 
Subsequent trials—thus earning a zero (or a very low) score.* This 
79r-none feature of the score makes for a U-shaped distribution. 


*Se si ibili k: Appleton-Century- 
Crofts, 13,9 D Hypnosis and Suggestibility (New York: Appleto: ury. 
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4. The influence upon distribution form of errors made in the construction 
and administration of tests 


Other factors besides those already mentioned make for distortions 
in score distributions. Differences in the size of the units in which a 
trait has been measured, for example, will lead to skewness. Thus, if 
the test items are very easy at the beginning and very hard later on, 
an increment of one point of score at the upper end of the test scale 
will be much greater than an increment of one point at the low end 
of the scale. The effect of such unequal or “rubbery” units is the 
same as that encountered when the test is too easy—scores tend to 


pile up toward the high end of the scale and be stretched out ог 
skewed toward the low end. 


Errors in administration of a test as in timing or giving instruc- 
tions; errors in the use of scoring stencils; large differences in prac- 
tice or in motivation among the subjects—all of these factors, if they 
cause many students to score higher or lower than they normally 
would, will make for skewness in the distribution. 


PROBLEMS 


1. In two throws of a coin, what is the probability of throwing at least one 
head? P 


2. What is the probability of throwing exaetly one head in three throws 
of a coin? 


3. Five coins are thrown. What is the probability that exaetly two of them 
will be heads? 


4. А box contains 10 red, 20 white and 30 blue marbles. After a thorough ` 


shaking, a blindfolded person draws out 1 marble. What is the prob- 
ability that 

(a) itis blue? 

(b) red or blue? 

(c) neither red nor blue? 


. If the probability of answering a certain question correctly is four 


times the probability of answering it incorrectly, what is the probability 
of answering it correctly? 


. (a) If two unbiased dice are thrown what is the probability that the 
number of spots showing will total 7? 
(b) Draw up a frequency distribution showing the occurrence of com- 
binations of from 2 to 12 spots when two dice are thrown. 
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(a) In an attitude questionnaire containing 10 statements, each to be 
marked as True or False, what is the probability of getting a per- 
fect score by sheer guesswork? 

(8) Suppose you know 5 statements to be True and 5 False. What is 
the probability that you will mark the right ones True (select the 
right five) ? 

A rat has five choices to make of alternate routes in order to reach the 

food-box. If it is true that for each choice the odds are two to one in 

favor of the correct pathway, what is the probability that the rat will 
make all of its choices correctly ? 


- Assuming that trait X is completely determined by 6 factors—all simi- 


lar and independent, and each as likely to be present as abstnt—plot the 
distribution which one might expect to get from the measurement of 
trait X in an unselected group of 1000 people. 

Toss five pennies thirty-two times, and record the number of heads and 
tails after each throw. Plot frequency polygons of obtained and ex- 
pected occurrences on the same axes. Compare the M’s and o's of ob- 
tained and expected distributions. 


What percentage of a normal distribution is included between the 


(а) mean and 1.546 (d) —3.5PE and 10РЕ 
(b) mean and —2.7PE (e) .6бс and 1.780 
(c) —1.73с and .566 (f) —LSPE and —2.5PE 


In а normal distribution 
(a) Determine Р. Рав, P54) and Pg; in g-units. 
(b) What are the percentile ranks of scores at — 1.236, —.500, 4.840? 


(a) Compute measures of skewness and of kurtosis for the first two 
frequency distributions in Chapter 2, Problem 1, page 40. 

(b) Fit normal probability eurves to these same distributions; using 
the method given on page 102. ) 

(с) For each distribution, compare the percentage of cases lying be- 
tween +16 with the 68.26% found in the normal distribution. 


Suppose that the height of the maximum ordinate (y) in a normal 
Curve is 50. What is the height to the nearest integer of the ordinate 
at the z/g point which cuts off the top 11% of the distribution? top 
80%? bottom 5%? (Use Tables A and B.) 


Ina sample of 1000 cases the mean of a certain test is 14.40 and g is 


2.50. Assuming normality of distribution 4 
(а) How many individuals score between 12 and 16? 


How many score above 18? below 8? х 
(с) What are the chances that any individual selected at random will 


Score above 15? 


ye 


18. 


19. 


20. 


21. 


22. 
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In the Army General Classification Test the distribution is essentially 
normal with a M = 100 and SD = 20. 

(a) What percent of scores lie between 85 and 125? 

(b) The middle 60% fall between what two scores? 

(с) On what score does Q; fall? 


In a certain achievement test, the seventh-grade mean is 28.00 and 
SD is 4.80; and the eighth-grade mean is 31.60 and SD is 4.00. What 
percent of the seventh grade is above the mean of the eighth grade? 
What percent of the eighth grade is below the mean of the seventh 
grade? 


Two years ago a group of twelve-year-olds had a reading ability ex- 
pressed by a mean score of 40.00 and a o of 3.60; and a composition 
ability expressed by a mean of 62.00 and a o of 9.60. Today the group 
has gained 12 points in reading and 10.8 points in composition. How 
many times greater is the gain in reading than the gain in composition? 


In Problem 1, Chapter 4, we computed directly from the distribution 
the percent of Group A which exceeds the median of Group B. Com- 
pare this value with the percentage of overlapping obtained on the 
assumption of normality in Group A. 


Four problems, A, B, C, and D, have been solved by 50%, 60%, 70%, 
and 80%, respectively, of a large group. Compare the difference in 


difficulty between A and B with the difference іп difficulty between 
C and D. 


In a certain college, ten grades, A+, А, A—; B+, B, B-; C+, С, 
C— ; and D, are assigned. If ability in mathematics is distributed nor- 


mally, how many students in a group of 500 freshmen should receive 
each grade? 


Assume that the distribution of grades in a class of 500 freshmen is 
normal with M — 72 and SD = 10. The instructor wants to give letter 
grades as follows: 10% A's; 30% B's; 40% C's; 15% D's; and 


5% F's. Compute to the closest score the divisions between A’s and Ве; 
B’s and C's; Св and D’s; D's and F's. 


ANSWERS 


. 9/4 2. 3/8 3. 10/32 
. (a) 1/2 


(b) 2/8 
(с) 1/3 


. 4/5 6. (a) 1/6 


AU 


7. 


8. 
10. 


Д1. 
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13. 


14, 
15. 


16. 


17. 
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22, 
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(a) 1/1024 
(b) 1/252 
32/243 
For expected distribution ! 
М = 2.5, с = 112 
(а) 4383 (а) 7409 
(6) 4657 (е) 2171 
(с) .6705 (f) .0665 
(а) —.610, —.100, .106, 886 
(b) 11, 31, 80 
(a) Skewness Kurtosis 
By formula (19) By formula (20) By formula (21) 
(1) —.018 — 27 289 
(2) 1156 1.03 277 
(с) 66%, 67% 
23, 44, 13 
(a) 570 
(b) 50; 3 
(c) 33 in 100 or 1 in 3 
(a) 67% 
(5) 83 and 117 
(c) 113 
23%; 18% 
Three times as great. 
39% as compared with 42%. 
Difference between A and B is 256; between e and D, 32g. 
Grades: A+ A A— B+ В В- C+ CC- D 
Students 
Receiving: 3 14 40 80 113 118 80 4014 3 
85; 75; 64; 56 
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I. The Meaning of Correlation 
ж 


І. Correlation as a measure of relationship 


In previous chapters we have been concerned primarily with 

methods of computing statistical measures designed to represent in a 
reliable way the performance of an individual or a group in some 
defined trait. Frequently, however, it is of more importance to 
examine the relationship of one ability to another than to measure 
performance in either alone. Are certain abilities closely related, and 
others relatively independent? Is it true that good pitch discrimina- 
tion accompanies musical achievement; or that bright children tend 
to be less neurotic than average children? If we know the general 
intelligence of a child, as measured by a standard test, can we say 
anything about his probable scholastic achievement as represented 
by grades? Problems like these and many others which involve the 
relations among abilities can be studied by the method of correla- 
tion, . 
: When the relationship between two sets of measures ің linear," 
l.e., can be described by a straight line,* the correlation between 
scores шау be expressed by the “product-moment” coefficient of cor- 
relation, designated by the letter r. Тһе method of calculating r will 
be outlined in Section III. -Before taking up the details of calculation, 
let us make clear what correlation means, and how r measures 
relationship. 

Consider, first, a situation in which relationship is fixed and 
unchanging. The circumference of a circle is always 3.1416 times its 


* See pages 154-158 for a further discussion of "linear" 
122 


relationship. 
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diameter (C = 3.1416D), and this equation holds no matter how 
large or how small the circle, or in what part of the world we find it. 
Each time the diameter of a circle is increased or decreased, the cir- 
cumference is increased or decreased by just 3.1416 times the same 
amount. In short, the dependence of circumference upon diameter is 
absolute; the correlation between the two dimensions is said to be 
Perfect, and r = 1.00. In theory, at least, the relationship between 
two abilities, as represented by test scores, шау also be perfect. Sup- 
Pose that a hundred students have exactly the same standing in two 
tests—the student who scores first in the one test scores first in the 
other, the student who ranks second in the first test ranks second in 
the other, and this one-to-one correspondence holds throughout the 
entire list. The relationship is perfect, since the relative position of 
each subject is exactly the same irffone test as in the other; and the 
Coefficient of correlation is 1.00. 

Now let us consider the case in which there is just no correlation 
Present. Suppose that we have administered to 100 college seniors 
the Army General Classification Test and a simple “tapping test” in 
Which the number of separate taps made in thirty seconds is re- 
— * Corded. Let the mean AGCT score for the group be 120, and the 

Mean tapping rate be 185 taps in thirty seconds. Now suppose that 
When we divide our group into three sub-groups in accordance with 

e size of their AGCT scores, the mean tapping rate of the superior 
or “high” group (whose mean AGCT score is 130) is 184 taps in 
thirty Seconds; the mean tapping rate of the “middle” group (whose 
Mean AGCT score is 110) is 186 taps in thirty seconds; and the mean 
tapping rate of the “low” group (whose mean AGCT score is 100) is 


ж 


| {арз їп thirty seconds. Since tapping rate is almost identical іп | 


2 Тее groups, it is clear that from tapping rate alone we shoul 
% € unable to draw any conclusion as to a student’s probable perform- 
Ance upor AGCT. A tapping rate of 185 is as likely to be found with 
R RAGOT score of 100 as with one of 120 or even 160. In other words, 
there is no correspondence between the scores made by the members 
Y Our group upon the two tests, and т, the coefficient of correlation, 

Zero,* 

, Perfect relationship, then, is expressed by a coefficient of 1.00, and 
ak no relationship by a coefficient of .00. Between these two limits, 
creasing degrees of relationship are indicated by such coefficients as 


ұм 


2 


Salt 5 be 
may bi f s (here 3) is unimportant: any 
Conveni? be noted tha number of groups (hi у 
шешеп Set may aos important point is that when the correlation 
? there is по systematic relationship between two sets of scores. 
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83, or .65, or .92. А coefficient of correlation falling between .00 and 
1.00 always implies some degree of positive association, the degree 
of correspondence depending upon the size of the coefficient. 

Relationship may also be negative; that is, a high degree of one 
trait may be associated with a low degree of another. When negative 
or inverse relationship is perfect, т = —1.00. To illustrate, suppose 
that in a small class of ten schoolboys, the boy who stands first in 
Latin ranks lowest (tenth) in shop work; the boy who stands second 
in Latin ranks next to the bottom (ninth) in shop work; and that 
each boy stands just as far from the top of the list in Latin as from 
the bottom of the list in shop work. Here the correspondence between 
achievement in Latin and performance in shop work is one-to-one 
and definite enough, but the direction of relationship is inverse and 
т = —1.00. Negative coefficients may range from —1.00 up to .00, 
just as positive coefficients may range from .00 up to 1.00. Coefficients 
of —.20, —.50, or —.80 indicate increasing degrees of negative or 
inverse relationship, just as positive coefficients of .20, .50, and .80 
indicate increasing degrees of positive relationship. 


2. Correlation expressed as agreement between ranks 


The notion underlying correlation can often be most readily com- 


_ prehended from a simple graphic treatment. Three,examples will be 


given to illustrate values of r of 1.00, —1.00, and approximately .00. 
Correlation is rarely computed when the number of cases is less than 
25, so that the examples here presented must be considered to have 
illustrative value only. 

Suppose that four tests, A, B, C, and D, have been administered to 
а group of five children. The children have been arranged in order 
of merit on Test А and their scores are then compared separately 
with Tests B, C, and D to give the following three cases: | 


Case 1 Case 2 Case 3 
Pupil А B Pupil A с Pupil A D 
a 15 53 a 15 64 а 15 . 102 
b 14 52 b 14 65 b 14 100 
в 13 51 mm o 13 66 c 13 104 
4 12 50 а 12. 67 4 12 108 
e п 49 e 1 68 e 1 101 


Now if the second series of scores under each case (i.e., В, C, and D) 
is arranged in order of merit from the highest score down, and the two 
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Scores earned by each child are connected by a straight line, we have 
the following graphs: 


Case 1 Case 2 Case 3 
A B A с А D 
15 53 MS t 68 15 104 
14— — §2 # 14 67 14 108 
18— — — 5] 13 66 18 102 
12— — — 50 12 65 12 101 
11—— — — 49 11 64 11 100 


All connecting linesare АП connecting lines No system is exhibited 
horizontal and parallel, intersect in one point. by the connecting lines, 
and the correlation is Thecorrelationisnega- but the resemblance is 
Positive and perfect. tive and perfect, and closer to Case 2 than 


T = 1.00 т-- 1.00 to Case 1. Correla- 
tion low and negative 
The more nearly the lines connecting the paired scores are horizon- 

tal and parallel, the higher the positive correlation. The more nearly 

the connecting lines tend to intersect in one point, the larger the 
negative correlation. When the connecting lines show no systematic 


trend, the correlation approaches zero. 


3. Summary 


To summarize our discussion up to this point, coefficients of cor- 
relation range over a scale which extends from --1.00 through .00 
to 1.00. A positive correlation indicates that large amounts of the 
One variable tend to accompany large amounts of the other; a nega- 
tive Correlation indicates that small amounts of the one variable tend 
9 accompany large amounts of the other. A zero correlation indi- 
Cates no consistent relationship. We have illustrated above only per- 
Cet Positive, perfect negative, and approximately zero correlation in 
“a der to bring out the meaning of correlation in a striking way. Only 
ке» if ever, however, will a coefficient fall at either extreme of the 
fal e, i.e., at 1.00 or —1.00. In most actual problems, calculated 7’s 
, 8t intermediate points, such as .72, —.26, .50, ete. Such 775 are to 

E Interpreted as “high” or “low” depending in general upon how 
ami they are to £1.00. Interpretation of the degree of relationship 
"Dressed by r in terms of various criteria will be discussed later on 


. Pages 173. 
E 73-178, 


' 
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Il. The Coefficient of Correlation* 


1. The coefficient of correlation as a ratio 


The product-moment coefficient of correlation may be thought of 
essentially as that ratio which expresses the extent to which changes 
in one variable are accompanied by—or are dependent upon— 
changes in a second variable. As an illustration, consider the follow- 
ing simple example which gives the paired heights and weights of five 
college seniors: 


а) о 9 @ 6) (9 (7) (8) (9) 


Ht. Wt. 
Student in in 
inches lbs. 
A y ЕЖЕ А 
% F ж V =% Oz Oy i A 
a 72 170 3 0 0 1.34 00 .00 
b 69 165 0 -5 0 00 —.3 A 
с 66 150 —3 -20 60 - 1.34 — 146 1.96 
а 70 180 T 10 10 44 3 32 
e 68 18 —1 15 -15 —.44 110 — .48 
. 55 1.80 
Mx = 69in. oz = 2.24 in. 25.4) 1, 
Му = 170165. c, = 13.69 lbs. correlation = Sy = x = .36 


From the X and Y columns it is evident that tall students tend to be 
somewhat heavier than short students, and hence the correlation 
between height and weight is almost certainly positive. The mean 
height is 69 inches, the mean weight 170 pounds, and the o’s are 
2.24 inches and 13.69 pounds, respectively. In column (4) are given 
the deviations (276) of each man’s height from the mean height, and 
in column (5) the deviations (y’s) of each man’s weight from the 
mean weight. The product of these paired deviations (27/6) is a 
measure of the agreement between individual heights and weights, 
and the larger the sum of the xy column the higher the degree of 
correspondence. When agreement is perfect (and r = 1.00) the Say 
column has ng maximum value. One may wonder why the sum of 
EIS (ie, 3 would not yield a suitable measure of relationship 
. between z and-y. The answer is that such an average is not a stable 

measure of relationship, as it depends upon the units in which height 

* This section may be taken up after Section III. 


x 
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and weight have been expressed, and consequently will vary if centi- 
meters and kilograms, say (as shown in the example below , are 
employed instead of inches and pounds. One way to avoid the 
troublesome matter of differences in units is to divide each x and 
each y by its own о, i.e., express each deviation as a o-score. The 
Sum of the products of the o-scores—column (9)—divided by N 
yields a ratio which, as we shall see later, is a stable expression of 
relationship. This ratio is the “product-moment” * coefficient of 
correlation. Its value of .36 indicates a fairly high positive correla- 
tion between height and weight in this small sample. The student 
Should note that our ratio or coefficient is simply the average prod- 
uct of the o-scores of corresponding X and Y measures. 

Let us now investigate the effect upon our ratio of changing the 
units in terms of which X and Y have been expressed. In the example 
below, the heights and weights of the same five students are expressed 

to the nearest whole number) in centimeters and kilograms instead 
of in inches and pounds: 


(1) Qo @ wo 9 0 Q0 (8) (9) 


Ht. Wt. 
Student in in 
ems, kgs. 5 , d 
Жж ES AUTE) 
a 183 77 8 0 0 1.43 .00 .00 
b 175 75 0 -2 0 - 32 00 
с 168 68 -7 -9 63 -125 — 143 179 
4 178 82 3 5 15 5 .80 42 
e 173 84 -2 7 —M — 56 111 - 40 
64 1.81 
z y 
Mx - 175 cms. oz = 5.01 cms. 4 (2 i) e 3l 
My =77kgs. с, = 6.30 kgs. correlation = ——7y—  —7g = + 


The mean height of our group is now 175 ems. and the mean weight 
(7 kgs.; the o’s are 5.61 ems. and 6.30 kgs., respectively. Note that 
the sum of the zy column, namely, 64, differs by 9 from the sum of 

ет in the example above, in which inches and pounds were the 
Units of measurement. However, when deviations are expressed as 


°-Scores, the sum of their products e . “| divided by N equals .36 
as before, “ш 


The s iati ised 6 wer) and 
ivi um / from the mean (raised to some power) ап 
divided by N С КҮ уно к When corresponding deviations іп т and 


a Ушу 
К are multiplied together, summed, and divided by М (to give y the 


term «. 
zm Product-moment" is used. 
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The quotient 


N 
is a measure of relationship which remains constant for a given set 
of data, no matter in what units X and Y are expressed. When this 
Хту 


No;6, 
the produet-moment coefficient of correlation.* 


ratio is written 


it becomes the well-known expression for 7, 


2. The scatter diagram and the correlation table 


When N is small, the ratio method described in the preceding sec- 
tion may be employed for computing the coefficient of correlation 
between two sets of data. But when N is large, much time and labor 
may be saved by first arranging the data in the form of a diagram or 
chart, and then caleulating deviations from assumed, instead of from 
actual, means. Let us consider the diagram in Figure 35. This chart, 
which is called a “scatter diagram” or “scattergram,” represents the 
paired heights and weights of 120 college students. The construction 
of a scattergram is relatively simple. Along the left-hand margin 
from bottom to top are laid off the class-intervals of the height dis- 
tribution, measurement expressed in inches; and along the top of the 
diagram from left to right are laid off the class-intervals of the 
weight distribution, measurement expressed in pounds. Each of the 
120 men is represented on the diagram with respect to height and 
weight. Suppose that a man weighs 150 pounds and is 69 inches tall. 
His weight locates him in the sixth column from the left, and his 
height in the third row from the top. Accordingly, a “tally” is placed 
in the third cell of the sixth column. There are three tallies in all in 
this cell, that is, there are three men who weight from 150 to 159 
pounds, and are 68-69 inches tall. Each of the 120 men is represented 
by a tally in a cell or square of the table in accordance with the two 
characteristics, height and weight. Along the bottom of the diagram 
in the f; row is tabulated the number of men who fall in each weight- 
interval; while along the right-hand margin in the f, column is tabu- 

* The coefficien i i “ г 
fessor ы ыы е RU чын dee) Me 
earlier work of Galton and Bravais. See Walker, Н. M., Studies in the H istory 


of Statistical Method (Baltimore: Willi ilki; з 
оаа оге iliams and Wilkins Co., 1929), Chap 


% 


т 
г 


LINEAR CORRELATION * 129 


Weight in Pounds (X-Variable) 
100- 110- 190- 130- 140- 150- 160- 170- 
100 119 129 139 149 159 169 179 f, Ма 


174.5 

3 152.0 
5 1424 

e 
3 135.1 

5 

Ei 
E 128.0 

2 

c] 
Щщ 125.8 
11.8 

z 8 10. 528) 97 ,%0 ЦЭ 5 6 120 
Мы 625 641 654 666 67.0 689 689 102 
Summary 
Wei Mean ht. for given ; Mean wt. for given 

eight wt. interval Height ht. interval 
170-179 702 72-73 ТЕМЕН 
160-169 | 5 68.9 | ,9 70-71| 4 1320 | A 
150-159 | Ы 689 к 68-69 | c 1424 | г. 
140-149 | © 67.0 L © 66-67 р "5 135.1 + $ 
130-139 ( о 666( g 64-65| 8 1280| о 
120-195 654| 2 62-68 | 8 1953) 8 
9 4. 60-61 178] 8 
10-100) ^ $23) ^ = 


FIG. 35 д scattergram and correlation table showing the paired heights 


and weights of 120 students 


lated the number of men who fall in each height-interval. The iy 
Column and f, row must each total 120, the number of men in all. 
After all of the tallies have been listed, the frequency in each cell is 
Added and entered on the diagram. The scattergram is then a cor- 


relation table. 


Several interesting facts may be gleaned from the correlation table 


as it stands, For example, all of the men of a given weight-interval 
тау be studied with respect to the distribution of their heights. Tn 
129 third column there are twenty-eight men all of whom weigh 120- 
БЕ РУШД, One of the twenty-eight is 70-71 inches tall; four are 

inches tall; nine or 66-67 inches tall; seven are 64-65 inches 
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tall; and seven are 62-63 inches tall. In the same way, we may 
classify all of the men of a given height-interval with respect to 
weight distribution. Thus, in the row next to the bottom, there are 
thirteen men all of whom are 62-63 inches tall. Of this group one 
weighs 100-109 pounds; two weigh 110-119 pounds; seven weigh 
120-129 pounds; one weighs 130-139 pounds; and two weigh 140-149 
pounds. It is fairly clear that the "drift" of paired heights and 
weights is from the upper right-hand section of the diagram to the 
lower left-hand section. Even a superficial examination of the dia- 
gram reveals a fairly marked tendency for heavy, medium, and light 
men to be tall, medium, and short, respectively; and this general 
relationship holds in spite of the scatter of heights and weights 
within any given "array" (an array is the distribution of cases within 
a given column or row). Even before making any calculations, then, 
we should probably be willing to guess that the correlation between 
height and weight is positive and fairly high. 

Let us now go a step further and caleulate the mean height of the 
three men who weigh 100-109 pounds, the men in column one. Тһе 
mean height of this group (using the assumed mean method described 
in Chapter 2, p. 36) is 62.5 inches, and this figure has been written 
in at the bottom of the correlation table. In the same way, the mean 
heights of the men who fall in each of the succeeding weight-inter- 
vals have been written in at the bottom of the diagram. These data 
have been tabulated in a somewhat more convenient form below the 
diagram. From this summary, it appears that an actual weight. in- 
crease of approximately 70 pounds (104.5-174.5) corresponds to an 
increase in mean height of 7.7 inches; that is, the increase from the 
lightest to the heaviest man is paralleled by an increase of approxi- 
mately eight inches in height. Tt seems clear, therefore, that the cor- 
relation between height and weight is positive. 

Let us now shift from height to weight, and applying the method 
used above, find the change in mean weight which corresponds to the 
given change in height.* Тһе mean weight of the three men in the 
bottom row of the diagram is 117.8 pounds. Тһе mean weight of the 
thirteen men in the next row from the bottom (who are 62-63 inches 
fall) is 125.3 pounds. The mean weights of the men who fall in the 
other rows have been written in their appropriate places in the Мы 
column. | Іп the summary of results we find that in this group of 120 
men an increase of about 12 inches in height is accompanied by an 


* This change corresponds to the second regression line i i i 
Кш нега gression line in the correlation dia- 


MV 
Р 
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. increase of about 56.7 pounds in mean weight. Thus it appears that 


the taller the man the heavier he tends to be, and again the correla- 
tion between height and weight is seen to be positive. 


3. The graphic representation of the correlation coefficient 


It is often helpful in understanding how the correlation coefficient 


{ measures relationship to see how a correlation of .00 or .50, say, looks 
‘graphically, Figure 36 (1) pictures a correlation of .50. The data 


(1) (2) 
X-Test Row X-Test Row 
0-9 10-19 20-29 30-39 40-49 fy Means 0-9 10-19 20-29 30-39 40-49 fy mo, 
“л cus e| | | - 
30-39 Р 
: 46295 2 7% ЕШ 16 345 
E] 
b 20:29 24 245 52049 [ T |а NT 
^ ABI: 
15325, 0 MS m 16 145 
0-9 КІР 
| [ jaw LY TTT fee 
fe 4 16 2 в 4 64 fe 4 3824 38 aed 
Col, Means 14,5 19.5 24.5 295 345 Col. Means 45 14.5 216 345 44.6 


Т-.50 т=1.00 


(4) 
х x Test 


0-9 10-19 20-29 30-3940-49 y, Кот y 


ERME 
КРЕ 
ҰЯ 


wa a 
s a 
% 4 16 s 16 4 64 Ji 4' 10 34 136^ 4 64 
бо, Меапа 24.5 24.5 24.5 24.5 245 Col. Means 39,5 32,0 24.6 17.0 9.5 
T-.00 r=—.75 


FIG, 26 The graphical representation of the correlation coefficient 


ІП the table are artificial, and were selected to bring out the relation- 


d in as unequivoeal a fashion as possible. The scores laid off along 
et 


amply as the X-test “scores,” and the scores laid off at the left of 


Ор of the correlation table from left to right will be referred to 


* table from bottom to top as the Y-test “scores.” As was done in 


132 - STATISTICS IN PSYCHOLOGY AND EDUCATION 


Figure 35, the mean of each Y-row is entered on the chart, and the 
means of the X-columns are entered at the bottom of the dia- 
gram. 

The means of each Y-array, that is, the means of the “scores” fall- 
ing in each X-column, are indicated on the chart by small crosses. 
Through these crosses a line, called a regression line,* has been 
drawn. This line represents the change in the mean value of Y over 
the given range of X. In similar fashion, the means of each X-array, 
i.e., the means of the scores in each Y-row, are designated on the 
chart by small circles, through which another line has been drawn. 
This second regression line shows the change in the mean value of X 
over the given range of Y. These two lines together represent the 
linear or straight-line relationship between the variables X and Y. 

The closeness of association or degree of correspondence between 
the X- and Y-tests is indicated by the relative positions of these 
two regression lines. When the correlation is positive and perfect, 
the two regression lines close up like a pair of scissors to form one 
line. Chart (2) in Figure 36 shows how the two regression lines look 
when r = 1.00, and the correlation is perfect. Note that the entries 
in Chart (2) are concentrated along the diagonal from the upper 
right- to the lower left-hand section of the diagram. There is no 
“scatter” of scores іп the successive columns or rows, all of the scores 
in a given array being concentrated within one cell. If Chart (2) 
represented a correlation table of height and weight, we should know 
that the tallest man was the heaviest, the next tallest man the next 
heaviest, and that throughout the group the correspondence of height 
and weight was perfect. 

A very different picture from that of perfect correlation is pre- 
sented in Chart (3) where the correlation is .00. Here the two regres- 
sion lines, through the means of the columns and rows, have spread 
out until they are perpendicular to each other. There is no change in 
the mean Y-score over the whole range of X, and no change in the 
mean X-score over the whole range of Y. This is analogous to the 
situation described on page 123, in which the mean tapping rate of a 
group of students was the same for those with “high,” “middle,” and 
“low” AGCT scores. When the correlation is zero, there is "n way 
of telling from a subject’s performance in one test what his perform- 
ance will be in the other test. The best one can do is to select the 
mean as the most probable value of the unknown score. 


* Regression lines have important properties; th i i 
EE UM е properties; they will be defined and dis- 


Ж 
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. Chart (4) in Figure 36 represents a correlation coefficient of —.75. 
Negative relationship is shown by the fact that the regression lines 
through the means of the columns and rows, run from the upper left- 
to the lower right-hand section of the diagram. The regression lines 
are closer together than in Chart (1) where the correlation is .50, but 
are Still separated. If this chart represented a correlation table of 
height and weight, we should know that the tendency was strong for 
tall men to be light, and for short men to be heavy. 


Weight in Pounds (X) 


100- 110- 1%- 130- 140- 150- 160- 170- 
109. 19 129 9 149 19 169 179 


Height in Inches (Y) 


9 5 6 120 
68.9 702 


Хз 1 38 37 22 
Col. Means 62,5 641 654 66.6 670 68.9 


FIG. 37 Graphical representation of the correlation between height and 
weight in a group of 120 college students (Fig. 35) 


The charts in Figure 36 represent, as was stated above, a linear 
sc lationship between sets of artificial test scores. The data were 
RR so as to be symmetrical around the means of each column 
5% r and hence the regression lines go through all of the crosses 
Hm hr ough all of the circles in the successive columns and rows. It 
all "ed if ever true, however, that the regression lines pass through 
xen the means of the columns and rows in a correlation table which 
кусы actual test scores or other real measures. Figure 37, which 
x Oduces the correlation table of heights and weights given on 
COD ilustrates this fact. The mean heights of the men in the 

ght (X) columns are indicated by crosses, and the mean weights 
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of the men in the height (Y) rows by circles, as in Figure 36. Note 
that the series of short lines joining the suecessive crosses or circles 
presents a decidedly jagged appearance. Two straight lines have been 
drawn in to describe the general trend of these irregular lines. These 
two lines go through, or as close as possible to, the crosses or the 
circles, more consideration being given to those points near the middle 
of the chart (because they are based upon more data) than to those 
at the extremes (which are based upon few scores). Regression lines 
are called lines of “best fit” because they satisfy certain mathemati- 
cal criteria to be given later (p. 154). Such lines describe better than 
any other straight lines the “run” or “drift” of the crosses and circles 
across the chart. 

In Chapter 7 we shall develop equations for the "best fitting" lines 
and show how they may be drawn in to describe the trend of irregu- 
lar points on a correlation table. For the present, the important fact 
to get clearly in mind is that when correlation is linear, the means 
of the columns and rows in a correlation table can be adequately 
described by two straight lines and the closer together these two lines, 
the higher the correlation. 


Ill. The Calculation of the Coefficient of Correlation by the 
Product-Moment Method 


1. The calculation of г from a correlation table 


Having discussed the meaning of correlation in the last sections, 
we shall now proceed to the calculation of the coefficient of correla- 
tion by the product-moment method. Figure 38 will serve as an illus- 
tration of the computations required. This correlation table gives the 
paired heights and weights of 120 college students, and is derived 
from the scattergram for the same data shown in Figure 35. The fol- 
lowing outline of the steps іп the process of calculating r will be best 
understood if the student will constantly refer to Figure 38 as he 
reads through each step. 


Step | 4 


Construct a scattergram for the two variables to be correlated, 
and from it draw up a correlation table as described on page 128. 


) 


135 


LINEAR CORRELATION * 


45% 


sjuapnys eBe|[oo 
021 19 бәм рио syyBioy әң uee^eq uoup[o1102 jo jus!gooo juouourjonpoid əy} jo uoopo gE “ОН 


ч 09-4 
Se TX ТЕТ 


Y g zag %- 
8” би (90 %- 
Y 9%. (89)82 
с 89 в 2% 
eU 6 в 
ct so quA Af 
eae 


PIT = OLX PATT = Z9 = 3 XTET = 


OZI Nne OZI л ло 
orx D BX voy — 90z/ 


To£0'— 62, 


8t 2% а о 9 % sr faz 
9 IT 9 2 GI-zI- 9- //z 


97 98 22 82 оғ Buf 
її 8T 22 (19—)85— o- 6- af 
e Т 0 A е-2 


20: 18” 86" 01 97 71 


T "| 19-09 
(6) 

y 9 m 
er 2 ot |8929 $ 
е Е 
9% I |999 5 
(в) B 
0 ЕЗ 
ее 19°99, 8 
Мг 
82 69-89 D 
9 шо È 
5 

I 81-61 


^f 6 691 601 6FI GE 601 GIT 601 
"OLI -09I -OST -OPI -OST -OZI -OIT -00I 
(әгағиед-х) spunog ur 448M 


136 * STATISTICS IN PSYCHOLOGY AND EDUCATION 
Step 2 


The distribution of heights for the 120 men is in the f, column at 
the right of the diagram. Assume a mean for the height distribution, 
using the rules given in Chapter 2, page 37, and draw double lines 
to mark off the row in which the assumed mean (ht) falls. The 
mean for the height distribution has been taken at 66.5 in. (midpoint 
of interval 66-67) and the y’s have been taken from this point. The 
prime (’) of the 275 and у” indicates that these deviations are taken 
from the assumed means of the X and Y distributions (see p. 37). 
Now fill in the fy’ and fy’? columns. From the first column су, the 
correction in units of interval, is obtained; and this correction to- 
gether with the sum of the fy’? will give the c of the height distribu- 
tion, oy. As shown by the calculations in Figure 38, the value of o; 
is 2.62 inches. 

Тһе distribution of the weights of the 120 men is in the f, row at 
the bottom of the diagram. Assume a mean for the weight distribu- 
tion, and draw double lines to designate the column under the as- 
sumed mean (wt). The mean for the weight distribution is taken at 
134.5 pounds (midpoint of interval 130-139), and the 278 are taken 
from this point. Fill in the fx’ and the fa’? rows; from the first cal- 
culate cz, the correction in units of interval, and from the second 
calculate oz, the с of the entire weight distribution. In Figure 38, 
the value of с, is found to be 15.54 pounds. 


Step 3 


Тһе calculations in Step 2 simply repeat the now familiar process 
of calculating о by the Assumed Mean method. Our first new task is 
to fill in the Ха/у” column at the right of the chart. Since the entries 
in this column may be either + or —, two columns are provided 
under Yz^y', Calculation of the entries in the Za^y' column may be 
illustrated by considering, first, the single entry in the only occupied 
cell in the topmost row. The deviation of this cell from the AM of 
the weight distribution, that is, its 2’, is four intervals, and its devia- 
tion from ey the height distribution, that is, its y’, is three 
intervals. Hencepthe product of the deviations of this cell from the 
two АМ? is 4 Х8 or 12; and a small figure (12) is placed in the 
upper right-hand corner of the cell.* The "product-deviation" of the 


M. (wt), and the у by counting up three intervals from the horizontal 
containing the Ы (ht). The unit of measurement is the class- КУН NU EM 


7 
\ 


> 


n 
—4 


+ and in the other the y’ equals zero. 
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one entry in this cell is 1(4 X 3) or 12 also, and hence a figure 12 is 
placed in the lower left-hand corner of the cell. This figure shows'the 
product of the deviations of this single entry from the AM's of the 
two distributions. Since there are no other entries in the cells of this 
row, 12 is placed at once under the + sign in the =z’y’ column. 
Consider now the next row from the top, taking the cells in order 
from right to left. Тһе cell immediately below the one for which we 
have just found the product-deviation also deviates four intervals 
from the AM (wt) (its x’ is 4), but its deviation from the AM (At) 
is only two intervals (its y’ is 2). The product-deviation of this cell, 
therefore, is 4 X 2 or 8, as shown by the small figure (8) in the upper 
right-hand corner of the cell. There are three entries in this cell, and 
Since each has a product-deviation of 8, the final entry in the lower 
left-hand corner of the cell is 3(4 X 2) or 24. The product-deviation 
of the second cell in this row is 6 (its 2’ is 3 and its y’ is 2) and since 
there are two entries in the cell, the final entry is 2(3 X 2) or 12. 
Each of the four entries in the third cell over has a product-deviation 
of 4 (since а” = 2 and y = 2) and the final entry is 16. In the fourth 
cell, each of the three entries has a product-deviation of 2(z/ = 1 and 
У = 2) and the cell entry is 6. The entry in the fifth cell over, the 
cell in the АМ (wt) column, is 0, since z' is 0, and accordingly 
3(2 0) must be 0. Note carefully the entry (—2) in the last cell 
of the row. Since the deviations of this cell are 2^ = —1, and у = 2, 
the product 1(—1 X 2) = —2, and the final entry is negative. Now 
we may total up the plus and minus entries in this row and enter the 
results, 58 and —2, in the Xa^y' column under the appropriate signs. 
The final entries in the cells for the other rows of the table and the 
Sums of the product-deviations of each row are obtained as illus- 
trated for the two rows above. The reader should bear in mind in 
calculating az^y"s that the product-deviations of all entries in the 
cells in the first and third quadrants of the table are positive, while 


the product-deviations of all entries in the second and fourth quad- 
ered, too, that all 


he row headed 


entries either in the column headed by the AM, 
by the АМ. y have zero product-deviations, since 


Since all entries in a given row have the same : the. 
9f caleulating a^y"s may often be considerably reduced if each entry 
M à row-cell is first multiplied by its 27, and the sum of;these devia- 

ons (Xs) multiplied once for all by the common y’, Viz., ће y’ of 

6 тоз. The last two columns Ўл and Хл/у” contain the entries for 

€ rows, To illustrate the method of calculation, іп the second 
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row from the bottom, taking the cells in order from right to 
left, and multiplying the entry in each cell by its 2’, we have 
(2X1) + (1X0) + (7X —1) + (2 X —2) + (1 X —3) or —12. If 
we multiply this “deviation-sum” by the y’ of the whole row (i.e. 
by —2) the result is 24 which is the final entry in the Ez^y' column. 
Note that this entry checks the 28 and —4 entered separately in the 
Хау column by the longer method. This shorter method is often 
‘employed in printed correlation charts and is recommended for use 
ав soon as the student understands fully how the cell entries are 
obtained. 


Step 4 (Checks) 


Тһе S2’y’ may be checked by computing the product-deviations 
and summing for columns instead of rows. The two rows at the bot- 
tom of the diagram, Sy’ and Ez^y', show how this is done. We may 
illustrate with the first column on the left, taking the cells from top 
to bottom. Multiplying the entry in each cell by its appropriate y’. 
we have (1 X —1) + (1 X —2) + (1 X —3) or —6. When this entry 
in Ње Sy’ row is multiplied by the common 2" of the column (i.e., 
by —3) the final entry in the Zz^y' row is 18. Тһе sum of the ту” 
computed from the rows should check the sum of the vy’ computed 
from the columns. 

Two other useful checks are shown in Figure 38. The fy’ will equal 
the Sy’ and the fz’ will equal the Ea" if no error has been made. The 
fy’ and the fx’ are the same as the Ey’ and =a’; although these col- 
umns and rows are designated differently, they denote in each case 
the sum of deviations around their AM. 


Step 5 


When all of the entries in the Zz^y' column have been made, and 
the column totaled, the coefficient of correlation may be calculated 
by the formula 
Za 

N Tet 
| a iia 

(coefficient of correlation when deviations are taken from 
the assumed means of the two distributions) * 
* This formula for 7 differs slightly from the ratio formula developed on 


page 128). The fact that deviations are taken from assumed rather than from 


actual means makes it necessary to correct Ez'y' b i T 
the two corrections cz and су. Var Subtracting the pouce 
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Substituting 146 for жу” 02 for cy; -18 for c+; 1.31 for cy; 1.55 
for Or and 120 for N, r is found to be .60. (See Fig. 38.) 

; Tt is very important to remember that cz, cy, o, and бу are all left 
In units of class-interval in formula (22). This is done because all 
product-deviations (a/j^s) are in interval-units, and it is desirable 
therefore to keep all of the terms in the formula in interval-units. 
Leaving the corrections and the two o’s in units of class-interval 
facilitates computation, and does not change the result (ie., the 
value of the coefficient of correlation). 


2. The calculation of r from ungrouped data 


(1) THE FORMULA FOR r WHEN DEVIATIONS ARE TAKEN FROM THE 
MEANS OF THE TWO DISTRIBUTIONS X AND Y 
In formula (22) а” and y’ deviations are taken from assumed 


Уру 
210 by the product of 


means; and hence it is necessary to correct 
the two corrections, с„ and cy (р. 138). When deviations have been 
taken from the actual means of the two distributions, instead of from 
assumed means, no correction is needed, as both с, and су are zero. 
Under these conditions, formula (22) becomes 

_ Хау 

7 Noy (28) 

(coefficient of correlation when deviations are taken from 
the means of the two distributions) 

which is the ratio for measuring correlation developed on page 128. 


If we write. |22? for c, and Pe for оу, the N’s cancel and formula 


(23) becomes 


Ti = : (24) 
VZ х Ху? 
(coefficient of correlation when deviations are taken from 
the means of the two distributions) 


ш Which x and y are deviations from the actual means as in (23) and 

n 7^ and Xy? are the sums of the squared deviations in x and y taken 
Тош the two means, 

Em N is fairly large, so that the data can be grouped into a cor- 

10п table, formula (22) is always used in preference to formulas 

(2 ) оғ (24) as it entails much less calculation. Formulas (23) and 

тау be used to good advantage, however, in finding the correla~ 
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tion between short, ungrouped series (say, twenty-five cases or so). 
It is not necessary to tabulate the scores into a frequency distribu- 
tion. An illustration of the use of formula (24) is given in Table 18, 
below. The problem is to find the correlation between the scores 
made by twelve adults on two tests of “controlled association.” 

The steps in computing r may be outlined as follows: se 


Step | 


Find the mean of Test 1 (X) and the mean of Test 2 (Y). The 
means in Table 18 are 62.5 and 30.4, respectively. 


Step 2 

Find the deviation of each score on Test 1 from its mean, 62.5, and 
enter it in column х. Next find the deviation of each score in Test 2 
from its mean, 30.4, and enter it in column y. 


Step 3 


Square all of the тв and all of the y’s and enter these squares іп 


columns 22 and 72, respectively. Total these columns to obtain Ez? 
and 2?. 


TABLE 18 To illustrate the calculation of г from ungrouped scores 
when deviations are taken from the means of the series 


Test1  Test2 
Subject x Ys 2 


y 2? y ту 
IN 50 22  —125 —84 15625 70.56 105.00 
B 54 95 -85 —54 7225 2916 4590 
(6 56 34 —65 36 4225 1296 --9340 
D 59 28 -35 -24 1295 576 8.40 
E 60 20 -25 -44 625 19:36 1100 
F 62 30 = E 25 16 20 
G 61 32 is ТӨ 02250 20 E Esp) 
H 65 30 Йа 625 6  — 1.00 
т 67 28 4 5 - 24 20.25 5.76 — 10.80 
І 71 34 85 36 72:25 1296 3060 
К 7 36 85 56 7225 3136 47.60 
4 40 115 96 13225 9216 11040 
750 5% 50500 38292 “32120 

Ут >) 
Mx =625 My = 304 Росия 

ша ДЕ т ЫШЫ, Q3 
VEXSXES У595 х 28292 ` 
tg , 


М 


> 


2 
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Multiply the zs and y’s in the same rows, and enter these products 
(with due regard for sign) in the ту column. Total the ту column, 
taking account of sign, to get xy. 


Step 5 


Substitute for Ery, 321.50; for 22°, 595; and for Ху?, 282.92 іп 
formula (24), as shown in Table 18, and solve for 7. ~ 
While formula (24) is useful in calculating r directly from two 
Ungrouped series of scores, it has the same disadvantage as the “long 
method” of calculating means and o’s described in Chapters 2 and 3. 
The deviations x and y when taken from the actual nieans are usually 


‘decimals and the multiplication and squaring of these values is often 


а tedious task. For this reason—even when working with short un- 


grouped series—it is often easier to assume means, calculate devia- 


tions from these AM's, and apply formula (22). The procedure is 
illustrated in Table 19 with the same data given in Table 18. Note 


TABLE 19 To illustrate the calculation of г from ungrouped scores when 
deviations are taken from the assumed means of the series 


Test 1 Test 2 


Subject ox Y a! yia КО iat 
A 50 22 24109 279: 100 64 80 
В 54 25 -6 -5 36 25 30 
ie 56 34 z^ Аз {16° 16, as 
D 59 28 ке ie? 1 4 2 
Е 60 26 л 0 16 0 
Е 62 30 2 0 4 0 0 
G 61 32 1 2 1 4 2 
H 65 30 5 08025 0 0° 
1 67 28 КЖ 1250: 8149 Zi 14 
J 71 34 11 4 121 16 44. 
к 71 36 1 6 М1 5 66 
L 74 40 14 10 196 100 140 
750 365 . >, 146707 5285! 1.334 
750 65 cM) Gu» (хау) 
4Мх = 600 AMy = 30.0 
Mx = 62.5 My = 30.4 
б = 25 Be 334 "109 
C; = 6.25 с = 46 к тА 
7 = 7,04 X 486 (22) 
"= Ме 625 ду = VIF 16 
= 4. = 4.86 іг- 
Р ^ к. 
ж 
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that the two means, Mx and My, are first calculated. The correc- 
tions, с and cy, are found by subtracting AM x from Mx and AMy 
from My (p. 38). Since deviations are taken from assumed means, 
fractions are avoided; and the calculations of 227°, Sy’, Хт’ are 
readily made. Substitution in formula (22) then gives т. 


(2) THE CALCULATION OF T FROM RAW SCORES, I.E., WHEN DEVIATIONS 
ARE TAKEN FROM ZERO 


The calculation of r may often be carried out most readily—espe- 
cially when a calculating machine is available—by means of the fol- 
lowing formula which is based upon “raw” or obtained scores: 
= EXY -NMxMy 

VEX? = NIPST[ZY: NM] 


(coefficient of correlation calculated from raw or obtained scores) 


т (95) 


In this formula, X and Y are obtained scores, and Mx and Му are 
the means of the X and Y series, respectively. EX? and XY? are the 
sums of the squared X and Y values, and N is the number of cases. 

Formula (25) is derived directly from formula (22) by assuming 
the means of the X and Y tests to be zero. If AM x and AM y are zero, 
each X and Y score is a deviation from its AM as it stands, and hence 
we work with the scores themselves. Since the correction, c, always 
equals M — AM, it follows that when the AM equals 0, с. = M x, 
cy= My and с.с, = MxMy. Furthermore, when с. = Mx and 
cy = My and the “scores” are “deviations,” the formula 


72 
л үз — c?, X interval 


(see p. 54) becomes 


P 5 Oz = 


2 
and c, for the same reason equals 4 үсе M?y. If we substitute 


these equivalents for c;c,, Oz, and o, in formula (22), the formula for 
тіп terms of raw scores given іп (25) is obtained. 


Ап alternate form of (25) is often more useful in praetice. This is 
же. NEXY —3XX DY 
[NSX (У УУР (ЖУ? rad 


(coefficient of correlation calculated from raw or obtained scores) 
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This for А ; M n е 
[а formula is obtained from (25) by substituting = for Mx, and 


У} 
WW for My in numerator and denominator, and canceling the N’s. 
The calculation of r from original scores is shown in Table 20. 
The data are again the two sets of twelve scores obtained on the 
controlled association” tests, the correlation for which was found 
to be -78 in Table 18. This short example is for the purpose of illus- 
trating the arithmetic and must not be taken as a recommendation 
that formula (25) be used only with short series. As a matter of fact, 
formula (25) or (26) is most useful, perhaps, with long series, espe- 
Clally if one is working with a calculating machine. 


TABLE 20 To illustrate the calculation of r from ungrouped data when 
deviations are original scores (AM's = 0) 


E Test 1 Test 2 
Subject x Y х? Y: XY 
A 50 22 2500 484 1100 
B 54 25 2916 625 1350 
С 56 34 3136 1156 1904 
D 59 98 3481 784 1652 
Е 60 26 3600 676 1560 
F 62 30 3844 900 1860 
а 61 32 3721 1024 1952 
H 65 30 4225 900 1950 
1 67 28 4489 784 1876 
J 71 34 5041 1156 2414 
K 71 86 5041 1 2556 
L 74 40 s 5476 1600 2960 
i 750 365 27470 11355 23134 
М a = gao (means to two decimals) 
a 23134 — 12 X 62.50 x 30.42 din 
14117470 — 12 X (62.50)] [11385 — 12 X (30.42) . 
r = 78 пы : 


ir computation by formula (26) is straightforward and the 
(em easy to follow, but the calculations become tedious if the 
SR are expressed in more than two digits. When using formula 
«А " therefore, it will often greatly lessen the arithmetical work, if 
tity reduce” the original scores by subtracting a constant quan- 
Wo en each of the original X and Y scores. In Table 21, the same 
each 22 les of twelve scores have been reduced by subtracting 65 from 
Bote. the X Scores, and 25 from each of the Y scores. The reduced 

» entered in the table under.X’ and Y", are first squared to give 
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TABLE 21 To illustrate the calculation of r from ungrouped data when 
deviations are original scores (AM's — 0) 


Scores are “reduced” by the subtraction of 65 from each X, and 25 
from each Y to give X’ and Y’. 


Test Test 
1$ 32 
m. X Y x! Y' xa ү" X'Y' 
А 50 22 —15 -3 225 9 45 
B 54 95-1 0 121 0 0 
С 56 34 —9 9 81 81 - 81 
D 59 28 -6 3 36 9 — 18 Ё 
Е 60 26 -5 1 25 1 -5 
F 62 30 -3 5 25 -15 
с 61 32 —4 7 16 49 - 28 
Н 65 30 0 5 0 25 
І 67 98 2 8 4 9 6 
que 71724 6 9 36 81 54 
K 7 36 6 11 36 121 66 
L 74 40 9 15 81 225 135 
750 365 —30(®Х') _ 65(®У”) 670(2X") 635(5Ү%)  159(2X'Y?) 
p ZY’ 
Mx = 2 +05 My = = + 25 
30 _ 65 Е 
= – 15+ 65 = тз + 25 (f 
- 62.5 = 30. 
(19 x 151 — (—20 X AAS 
r= 715 x 670 — (— 50112 x 685 — (6571 4 09 
3858 
= 1923 , 
-.78 


SX” and VY”, and then multiplied by rows to give EX'Y'. Substi- 
tution of these SOS in formula (26) gives the coefficient of correla- 
„tion r. If the means of the two па аге wanted, these may readily 


be found by adding to = and 


N M mo amounts by which the X and 
Y scores were reduced (see computations in Table 21). 

The method of computing r by first reducing the scores is usually 
superior to the method of applying formula (25) or (26) directly to 
the raw scores. This is because we deal with smaller whole numbers; 
and much of the arithmetie сап be done mentally. When raw scores 
have more than two digits, they are cumbersome to square and multi- 
ply unless reduced. The student should note that instead of 65 and 
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25 other constants might have been used to reduce the Х and Y 
scores. If the smallest X and Y scores had been subtracted, namely, 
50 and 22, all of the X’ and Y’ would, of course, have been positive. 
This is an advantage in machine calculation but these reduced scores - 
would have been somewhat larger numerically than are the reduced 
scores in Table 21. In general, the best plan in reducing scores is to 
subtract constants which are close to the means. The reduced scores 
are then both plus and minus, but are numerically about as small as 
we can make them. 


(3) THE CALCULATION OF 7 BY THE DIFFERENCE-FORMULA 


It is apparent from the preceding sections that the product-moment 
formula for r may be written in several ways, depending upon 
whether deviations are taken from actual or assumed means and 
upon whether raw scores or deviations are employed. The present 
section contributes still another formula for calculating r—namely, 
the difference-formula. This formula will complete our list of expres- 
sions for т, as it is believed that the student who understands the 
Meaning and use of the correlation formulas given in this chapter will 
have no difficulty with other variations which he may encounter.” 

The formula for r by the difference method is 


T Уа? + Sy? — Ха? (27) 
à 2/Xz X Xy ! 
(coefficient of correlation by difference-formula, deviations 
from the means of the distributions) 


in which Xd? = X(x — y)?. 
The principal advantage of the difference-formula is that no cross 
Products (ay’s) need be computed. For this reason, this formula is 
employed in several of the printed correlation charts. Formula (27) 
is illustrated in Table 22 with the same data used in Table 19 апі 
elsewhere in this chapter. Note that the v, y, 22, and y? columns 
Tepeat Table 19. The d or (z — y) column is found by subtracting 
algebraically each y-deviation from its corresponding z-deviation. 
hese differences are then squared and entered in the d? or (zr — у)? 
column, Substitution of Ха, Zy?, and Zd? in formula (27) gives 
= .78, 
* Se SK ente ee e QAM Ux 
sy ht ving anite Н ie ДЕ даң Person) Clot of 
elation,” Journal of Educ. Psych., 1926, 17, 458-469. 
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TABLE 22 To illustrate the calculation of r from ungrouped data by the 
difference-formula, deviations from the means 


Test 1 Test 2 d d 
Subject X Y т y (x—» a y (= — у)? 
A. 50 22 -125 —84 — 4.1 15625 70.56 16.81 
B 54 25 —85 —54 — 3.1 7225 29.16 9.61 
с 56 34 -65 3.6 -101 4225 12.06 102.01 
D 59 28 —35 —24 -—11 12.25 5.76 1.21 
E 60 26 —25 —44 1.9 6.25 19.36 3.61 
F 62 30 -5 -4 = 1 .25 116 01 
с 61 32 — 15 16 —31 2.25 2.56 9.61 
H 65 30 25 —4 29 625 16 8.41 
I 67 28 45 -24 6.9 20.25 5.76 47.61 
J TL 34 8.5 3.6 4.9 72.25 12.96 24.01 
к 71 86 8.5 5.6 2.9 72.25 31.36 8.41 
L 74 40 11.5 9.6 1.9 132.25 9216 3.61 
595.00 282.92 234.92 
Mx = 62.5 


gis 595.00 + 282.92 — 234.92 


725/595 x 282.92 en 


= .78 


Another form of the difference-formula is often useful, especially 
in machine calculation. This version makes use of raw or obtained 
scores: 

p МХХ Хү? У(Х — Y)*] — 2(®Х) x Y) 
2V[NZEX?— (ZX)*][NZY? — (ZY)*] 
(coefficient of correlation by difference-formula, calculation 
from raw or obtained scores) 


(28) 


in which E(X — Y)? is the sum of the squared differences between 
, the two sets of scores. 


3. Averaging coefficients of correlation 


ЕЧ has been a fairly common practice to average correlation соећ- 
cients computed from tests given to comparable groups in order to 
obtain a generalized picture of the relationship between the two vari- 
ables. The averaging of r’s is, however, a dubious and often an 
incorrect procedure. In the first place, 78 do not vary along a linear 
scale so that the increase from .40 to .50 does not mean the same in- 
crease in relationship as does an increase from .80 to .90. Secondly, 
when -}7’s and —7’s are averaged, they tend to cancel each other out. 


# 
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* If r's do not differ greatly in size, their arithmetic mean will yield a 
useful result; but this is not true when 77 differ widely in size or in 
Sign. Averaging ап r of .70 and an r of .60 to obtain .65 is permis- 
sible; but averaging an r of .90 and an r of .10 to obtain .50 is not. | 

The safest plan is not to average 7’s at all. When for various rea- 
Sons averaging seems to be demanded by the problem, the’ eet 
method is to transform the r’s into Fisher's z-function (р. 426), and | 
take the arithmetic mean of the z’s. The mean z may then be con- | 
verted into ап equivalent т. An example will illustrate the procedure | 
to be followed in converting r’s to 28. 

Example (1) In 5 parallel experiments the following r’s are ob- 
tained between the same two variables: .50, .90, .40, .30, and .70. 
What is the mean of these coefficients ? 

By Table C we may convert these 7’s into the following 275: .55, 
1.47, 42, .31, and .87. The mean of these z’s is .72, which is equivalent 
to an r of .62. Comparison of this mean r with .56, the average of 
the rs as they stand, gives an idea of the correction effected in 
using z, 


PROBLEMS 


l. Find the correlation between the two sets of scores given below, using 
the ratio method (p. 126). * 


Subjects X. 2 
а 15 40 
b 18 42 
в 22 50 
а 17 45 
е 19 43 
f 20 46 
16 41 
5 21 4l қ 


Зна scores given below were achieved upon Army Alpha and Type- 
Writing Tests by 100 students in a typewriting class. The typewriting 
Scores ате іп number of words written per minute, with certain penal- 
ties. Find the coeficient of correlation. Use an interval of 5 units for 


and an interval of 10 units for X. 
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Typing (Y) Alpha (X) Typing (Y) Alpha (X) Typing (Y) Alpha (X) 


46 
31 
46 


152 


26 
33 
44 
35 
49 
40 


40 
36 
43 


120 


4 


x 
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8. In the correlation table given below compute the coefficient of correla- 
tion. 


Boys: Aces 4.5 то 5.5 YEARS 
Weight in Pounds (X) 
24-28 | 29-33 | 34-38 | 39-43 | 44-48 | 49-53 | Totals 


= 
8 
ч 
о 
4 
g 
» 
= 
беп 
© 
Ei 


4. In the following correlation table compute the coefficient of correlation. 
Army Alpha I.Q.'s 


School 5- |100-|105-|110- 
Marks 104 | 109 | 114 


90 апа оуег 
85-89 
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Ж 
5. Compute the coefficient of correlation between the Algebra Test scores 
and I.Q.’s shown in the table below. 


ALGEBRA Test Scores 


РА 
6. Compute the correlation between the two sets of scores given below 
(a) when deviations are taken from the means of the two series [use 
formula (24)]; 
(8) when the means are taken at zero. First reduce the scores by sub- 
tracting 150 from each of the scores in Test 1, and 40 from each of 
the scores in Test 2. 
Test 1 Test 2 Test 1 Test 2 
150 60 139 41 
126 40 155 43 
135 45 147 37 
176 50 162 58 
138 56 156 48 f 
142 43 146 39 
151 57 133 81 
163 38 168 46 
137 41 153 59 
178 55 150 57 


7. Find the correlation between the t 


wo sets of memory-span scores given 
below (the first series is arranged 


in order of size) (a) when deviations 
are taken from assumed means [formula (22)], (b) by the difference- 
method given on page 145. 


zi 


ею 


r ar. к З 
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Test 1 Test 2 
(digit span) (letter span) 
15 12 
14 14 4 
13 10 
12 8 
11 12 
11 9 
11 12 
10 8 
10 10 
10 9 
9 8 
9 7 
8 7 
7 8 
7 6 
ANSWERS 

= 65 5. r= 52 

= — 05 6. r= 41 

= 171 7. т= 48 


REGRESSION AND PREDICTION 


+ 


І.Тһе Regression Equations 


1. The problem of predicting one variable from another 


Suppose that in a group of 120 college students (p. 129), we wish 
to estimate ‘a certain man's height knowing his weight to be 156 
pounds. The best possible “guess” that we can make of this man’s 
height is the mean height of all of the men who fall in the 150-159 
weight-interval. In Figure 39 the mean height of the nine men in 
this column is 68.9 inches, which is, therefore, the most likely height 
of a man who weighs 153 pounds. In the same way, the most prob- 
able height of a man who weighs 136 pounds is 66.6 inches, the mean 
height of the thirty-seven men who fall in weight-column 130-139 
pounds. And, іп general, the most probable height of any man in the 
group is the mean of the heights of all of the men who weigh the same 
(or approximately the same) as he, ie. who fall within the same 
weight-column. 

Turning to weight, we can make the same kind of estimates, Thus, 
the best possible “suess” that we can make of a man’s weight know- 
ing his height to be 66.5 inches is 135.1 pounds, viz., the mean weight 
of the thirty-three men who fall in the height-interval 66-67 inches. 
Again, in general, the most probable weight of any man in the group 
is the mean weight of all of the men who are of the same (or approxi- 
mately the same) height. 

Our illustration shows that from the scatter diagram alone it is 
possible to “predict” one variable from another. But the prediction 
is rough, and is obviously subject to a large “error of estimate.” * 

* Sce page 161. | 
152 


ak 
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Weight in Pounds (X) 
100- 110- 120- 130- 140- 150- 160- 170- 
109 119 129 139 149 159 169 179 i 


72-73 


70-71 


(Y) 


Inches 
E f 


Height in 
F 
& 


FIG. 39 Illustrating positions of regression lines and calculation of the 
regression equations (See Fig. 38, p. 135.) | 


5 т = 60 For plotting on the chart, regression 
zu Mx — 1363 pounds equations are written with о. and oy 
My = 66.5 inches in class-interval units, viz— 
у= жа) see 
T= 71у} р. 154. 


Calculation of Regression Equations 


I. Deviation Form 


Е оозу 
1 = 60 х 202 x= 102 29 
г PETITES (29) 
(2 == 60 x 195%, = 3.56 Gn 
» ) дейын ын тты E 
7 ^^t IL Score Form 
(1) Y—665 -10(Х- 1363) or Y= 10Х + 52.9 (31) 
(2) X —1363—356(Y — 665) or X = 3.56Y — 1004 (32) 
Calculation of Standard Errors of Estimate 
Oest. n = 2.62\/1 — 60° = 2.10 inches (83) 
Oest. n 1554/1 — 60° = 1243 pounds (34) 


Moreover, while we have made use of the fact that the means are the 
most probable points in our arrays (columns or rows), we have made 
ДО Use of our knowledge concerning the over-all relationship between 
i. two variables. The two regression lines in Figure 39 are deter- 

€d by the correlation between height and weight and their degree 
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of separation indicates the size of the correlation coefficient ж (p.131): 
Consequently, they describe more regularly, and in а more general- 
ized fashion than do the series of short straight lines joining the 
means, the relationship between height and weight over the whole 
range (see also p. 153). А knowledge of the equations of these lines 18 
necessary if we are to make a prediction based upon all of our data. 
Given the weight (X) of a man comparable to those in our group, 
for example, if we substitute for X in the equation connecting Y and 
X we are able to predict this man's height more accurately than if 
we simply took the mean of his height array. The task of the next 
section will be to develop equations for the two regression lines by 
means of which predictions from X to Y or from Y to X can be made. 


2. The two regression equations in deviation form 


Тһе equations of the two regression lines in a correlation table rep- 
resent the straight lines which “best fit” the means of the successive 
columns and rows in the table. Using as a definition of "best fit" the 
criterion of “least squares,” + Pearson worked out the equation of 
the line which goes through, or as close as possible to, more of the 
column-means than any other straight line;-and the equation of the 
line which goes through, or as close as possible to, more of the row- 
means than any other straight line. These two lines are “best fitting” 


їп a mathematical sense, the one to the observations of the columns 
and the other to the observations of the rows. 


The equation of the first regression line, the line drawn to repre- 
sent the trend of the crosses in Figure 39, is as follows: 


J=r% Xr (29) 


Oz 
(regression equation of y on x, deviations taken from 
the means of Y and X) 


бу 
The factor Urs is called the regression coefficient, and is often re- 
т 
_ * The term “regression” was first used by Francis Galton with reference to the 
inheritance of stature. Galton found that children of tall parents tend to be 
less tall, and children of short parents less short, than their parents. In other 
words, the heights of the offspring tend to “move back” toward the mean 
height of the general population. This tendency toward maintaining the “mean 
height” Galton called the principle of regression, and the line describing the 
relationship of height in parent and offspring was called a “regression line.” 
The term is still employed, although its original meaning of “stepping back” to 
some stationary average is not necessarily implied (see p. 171). 

or an elementary mathematical treatment of the method of least squares 
as applied to the problem of fitting regression lines, see Walker, H. M., Ele- 


peony Statistical Method (New York: Henry Holt and Co., 1943), pp. 308- 


a 


Y 


a 
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placed in (29) by the term by: or Біз so that formula (29) may be 
Written у = bys X x, or = by. X x. The bar over the (y) means that 
our estimate is an average value. 

If we substitute in formula (29) the values of т, бу, and os, ob- 
tained from Figure 39, we have 


єт 2.62 ы 
= .60 —— v, or y = .10r 
4 твр 


This equation gives the relationship of deviations from mean height 
to deviations from mean weight. When т = +1.00, y = +.10; and 
a deviation of one pound from the mean of the X’s (weight) is accom- 
panied by a deviation of .10 inch from the mean of the Y’s (height). 
The man who stands one pound above the mean weight of the group, 
therefore, is most probably .10 inch above the mean height. Since 
this man’s weight is 137.3 pounds (136.3 + 1.00), his height is most 
Probably 66.6 inches (66.5 +.10). Again, the man who weighs 120 
Pounds, i.e., is 16.3 pounds below the mean of the group, is most 
Probably 64.9 inches tall—or about 1.6 inches below the mean height 
ОЁ the group. To get this last value, substitute z = —16.3 in the equa- 
tion above to get y = —1.63, and refer this value to its mean. The 
regression equation is a generalized expression of relationship. It 
tells us that the most probable deviation of an individualin our group 
from the My, is just .10 of his deviation from the Mut. 


The equation y — "ха gives the relationship between Y and Х 
о, 


in deviation form. This designation is appropriate since the two 
Variables are expressed as deviations from their respective means 
(ie, as т and у) ; hence, for a given deviation from Му the equation 
Blves the most probable accompanying deviation from My. 

The equation of the second regression line, the line drawn through 
the circles -(i.e., the means) of the rows in Figure 39, is 


т=т°®ху (30) 
бу 
(regression equation of x on y, deviations taken from 
the means of X and Y) 
As in the first regression equation, the regression coefficient ree 
Ж: p 
Y often replaced by the expression biy or be: and formula (30) writ- 
Б bay X y or T= ba ХУ. 
f we Substitute for т, Oz, and cy, in formula (30), we have 


e 15.54 
z= 60X. Обо 


y or x = 3.56у 
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from which it is evident that a deviation of 1 inch from the Мм, or 
from 66.5 inches, is accompanied by a deviation of 3.56 pounds from 
the Мы, or from 136.3 pounds. Expressed generally, the most prob- 
able deviation of any man from the mean weight is just 3.56 times 
his deviation from the mean height. Accordingly, a man 67 inches 
tall or .5 inch above the mean height (66.5 -+ .5 = 67) most probably 
weighs 138.1 pounds, or is 1.8 pounds above the mean weight 
(136.3 4- 1.8). (Substitute у = .5 in the equation and z = 1.8). 


Equation @ = 792 y gives the relationship between X and Y in 


deviation form. That is to say, it gives the most probable deviation 
of an X-measure from Mx corresponding to a known deviation in the 
Y-measure from My. 

Although both regression equations given above involve = and y, 
the two equations cannot be used interchangeably—neither can be 
used to prediet both т and y. This is an important fact which the 
student must understand clearly and constantly bear in mind. The 


first regression equation y = r% X x can be used only when y is to be 


Oz 
predicted from a given x (when у is the “dependent” variable) *; 
while the second equation 2 = т 22 y can be used only when 2 is to 


r у 

be predicted from a known у (when т is the “dependent” variable). 
There are always two regression equations in a correlation table, 

the one through the means of the columns and the other through the 

means of the rows, unless the correlation is 1.00 or —1.00. When 

r=1.00,7= rxs becomes 7 = 9v Xx ог yo, = тоу. Also, when 


Oz 


r=1.00, 7= тех y becomes 2 = 22 у or Toy = Yos. In short; 


v 
when the correlation is perfect (+1.00), the two equations are iden- 
tical and the two regression lines coincide. To illustrate this situa- 
tion, suppose that the correlation between height and weight in Fig- 
ure 39 were perfect. The first regression equation would then be 


J = 1.00 x 22 ; or y = 172, and the second, ® = 1.00 X 15.51 OE 
É. 1554 _ ; 262” 
T= 5.93у. Algebraically, the equation 2 = 5.93y is equal to y = .172; 


for if we put z = ip ж = 5.93y. When т = +1.00 there is only ong 


* The dependent variable takes its value from the other (i i 

] і а er (ind E 

able in the equation. For example, in the equation y — Bx о 
for its value upon т; hence у is the dependent variable. А 


y 
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equation and a single regression line. Moreover, if т = +1.00, and in 
addition о, = су, the single regression line makes an angle of 45° or 
135% with the horizontal axis, since y = +g. 


3.. Plotting the regression lines in a correlation table * 


In Figure 39, the coórdinate axes have been drawn in on the corre- 
lation table through the means of the Х- and Y-distributions. The 


FIG. 40 Plot of the straight line, y = 2x 


ЖА brief review of the equation of a straight line, and of the method of 
Plotting a simple linear ЕЯ is given here m order to simplify the plotting 
ne regression equations. 4 
In Figure 40, let X and Y be coóürdinate axes, or axes of, reference. Now 
Suppose that we are given the equation у = 2r and are required to represent 
the relation between т and у graphically. To do this we assign values to тіп 
е equation and compute the corresponding values of y. When z — 2, for ex- 
ample, y —2 X 2 or 4; when z = 3, y = 2X 3 or 6. In the same way, given any 
-value we can compute the value of y which will "satisfy" the equation, that 
15, make the left side equal to the right. If the series of z and y values found 
tom the equation are plotted on the diagram with respect to the X- and 
Y-coórdinates (as in Fig. 40) they will be found to fall along a straight line. 
18 Straight line pictures the relation y — 2z. It goes through the origin, since 
en ж = 0,4 = 0. The equation у = 2x represents, then, а straight line which 
passes through the origin; and the relation of its codrdinates (points lying along 


the line) is such that, called the slope of the line, is always equal to 2. 


The ; 7 і shich passes through the origin шау 
>, general f any straight line which passes ig gir 
be Written y = ou wher aA the slope of the line. If we replace m in the 


8eneral formula by r% it is clear that the regression line in deviation form, 
namely, y= т9зу, is aply the equation of a straight line which goes through 
the origin, For the same reason, when the general equation of a straight line 

"ough the origin is written 2 = my, € = "RV is also seen to be a straight line 


th 
Tough the origin, its slope being "m. 
y 
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vertical axis is drawn through 136.3 pounds (M+), and the horizon- 
tal axis through 66.5 inches (M;;). These axes intersect close to the 
center of the chart. Equations (29) and (30) define straight lines 
which pass through the origin or point of intersection of these co- 
ordinate axes. It is a comparatively simple task to plot in our regres- 
sion lines on the correlation chart with reference to the given coórdi- 
nate axes. 

Correlation charts are usually laid out with equal distances repre- 
senting the X and Y class-intervals (the printed correlation charts 
are always so constructed) although the intervals expressed in terms 
of the variables themselves may be, and often are, unequal and in- 
commensurable. This is true in Figure 39. In this diagram, the inter- 
vals in X and Y appear to be equal, although the actual interval for 
height is 2 inches, and the actual interval for weight is 10 pounds. 
Because of this difference in interval-length in the two variables it 
is very important that we express c, and o, in'our regression equa- 
tions in class-interval units before plotting the regression lines on the 
chart. Otherwise we must equate our X and Y intervals by laying 
out our diagram in such a way as to make the X-interval five times 
the Y-interval. This latter method of equating intervals is imprac- 
tieal, and is rarely used, since all we need do in order to use correla- 
tion charts drawn up with equal intervals is to express б, and бу in 
formulas (29) and (30) in units of interval. When this is done, and 
the interval, not the score, is the unit, the first regression equation 
becomes 


1.31 = 
= .60—- tor у = .51 
mn на ұйы 
and the second 


ТЕК 
= .60—— = 71 
Т ТТ шаригін 


білсе each regression line goes through the origin, only one other 
point (besides the origin) is needed in order to determine its course. 
In the first regression equation, if z = 10, y = 5.1; and the two points 
(0, 0) and (10, 5.1) locate the line. In the second regression equa- 
tion, if y = 10, x = 7.1; and the two points (0,0) and (7.1, 10) deter- 
mine the second line. In plotting points on a diagram any convenient 
scale may be employed. A millimeter rule is useful. 

It is important for the student to remember that when the two o's 
are expressed in interval units, regression equations do not give the 
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relationship between the X and У score deviations, These special 
forms of the regression equations should not be used except when 
plotting the equations on a correlation chart. Whenever the most 
probable deviation in the one variable corresponding to a known 
deviation in the other is wanted, formulas (29) and (30), in which 
the o’s are expressed in score units, must be employed. 


4. The regression equations in score form 


In the last sections it was pointed out that formulas (29) and (30) 
give the equations of the regression lines in deviation form—that 
values of з and у substituted in these equations are deviations from 
the means of the X and Y distributions, and are not scores. While 
the equations in deviation form are actually all that one needs in 
order to pass from one variable to another, it is decidedly convenient 
to be able to estimate an individual’s actual score in Y, say, directly 
from the score in X without first converting the X-score into a devia- 
tion from My. This can be done by using the score form of the 
Tegression equation. The conversion of deviation form to score form 
is made as follows: Denoting the mean of the Y’s by My and any 
Y-score simply by Y, we may write the deviation of any individual 
from the mean as Y — My or, in general, у = Y — My. In the same 
Way, x = X — Му when т is the deviation of any X-score from the 
mean X, If we substitute Y — My for y, and X — Mx for т, in for- 
mulas (29) and (30), the two regression equations become 


Y — My =r% (X — Mx) or 
y= өші. (X — Mx) + My (31) 
and г 


Х-Мх=т®®(Ү— My) or 
бу 
X= re (Y — My) + Mx (32) 
j : 


(regression equations of Y on X and X оп Y in score form) 


These two equations are said to be in score form, since the X and Y in 
oth equations represent actual scores, and not deviations from the 

7 . H . 

"еала of the two distributions. 
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If we substitute in (31) the values of My, т, бу, 6z, and M x obtained 
from Figure 39, the regression of height on weight in score form 
becomes 
— 2.62 
y —260X 15.54 


or upon reduction 


(X — 136.3) + 66.5 


Y = 10X + 52.9 


То illustrate the use of this equation, suppose that a man in our 
groups weighs 160 pounds and we wish to estimate his most probable 
height. Substituting 160 for X in the equation, Y — 69 inches; and 
accordingly, the most probable height of a man who weighs 160 
pounds is 69 inches. 

If the problem is to predict weight instead of height, we must use 
the second regression equation, formula (32). Substituting for Mx, 
T, Oz, Oy, and My in (32) we have 


Y 15.54 
Х-.60 
P 2.62 


(Y — 66.5) + 136.3 
or 
X — 3.56Y — 1004 


Now if a man is 71 inches tall, we find, on replacing Y by 71 in the 
equation, that X = 152.4. Hence the most probable weight of а man 
who is 71 inches tall is about 15215 pounds. 


5. The meaning of a “prediction” from the regression equation 


It may seem strange, perhaps, to talk of “predicting” a man’s 
height from his weight, when the heights and weights of the 120 men 
in our group are already known. When we have measures of both 
height and weight it is unnecessary, of course, to estimate one from 
the other. But suppose that all we know about a given individual is 
his weight and the fact that he falls within the age-range of our group 
of 120 men. Since we know the correlation between height and 
weight to be .60, it is possible from the regression equation to predict 
the most probable height of our subject in lieu of actually measuring 
him. Furthermore, the regression equation may be employed to 
estimate the height of amy man in the population from which our 
group is chosen, provided our sample is an unbiased selection from 
the larger group. A regression equation holds, of course, only for the 
population from which the sample group was drawn. We cannot 


EU 


ж 
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estimate the heights of children or of women from a regression equa- 
tion which describes the relationship between height and weight in 
men between the ages of eighteen and twenty-five years (the age- 
range of the students in our group). Conversely, we cannot expect a 
regression equation established for elementary-school children to 
hold for older groups. 

Height and weight, since they are both easily measured, perhaps 
do not demonstrate the value of the regression equation so clearly as 
do other and more complex traits. These variables were chosen for 
our “model” problem because they are objective and observable and 
their meaning is definite. Let us now consider a problem of more 
direct psychological interest. Suppose that in a group of 300 high- 
School children of nearly the same age, the correlation between a 
group intelligence test given at the beginning of the school year and 
average grades made in the first year of high school is .60. Now if we 
administer the group test to a child who enters school the next year, 
it is possible from his score to estimate his probable scholastic per- 
formance by means of the regression equation between test score and 
grades obtained from the previous years’ class. Forecasts of this 
Sort are useful in educational prognosis and guidance.* The same 
is true of vocational guidance; we are often able to predict from a 
test battery the probable success of an individual who contemplates 
entering a certain trade or profession. Advice on such a basis is 
measurably better than subjective judgment. 


Il. The Reliability of Predictions t 


l. The standard error of estimate 


The values of X and Y “predicted” from regression equations have 
беп constantly referred to as being the “most probable” values of 
the one variable accompanying the given value of the other. In order 
to show just how probable such estimates are it is necessary that we 
Calculate their standard errors of estimate. The accuracy with which 
We are able to predict Y-scores from equation (31) is given by the 
formula 
* Edgerton, Н. А, Academic Prognosis іп the University, Educational Psy- 


chol; 


D 
* QEY Monographs, 1930, 27. ; ; й 
уй tead, ЖӘНЕ end Shartle, C. L., Occupational Counseling Techniques (New 


; American Book Co., 1940). 
* This Section may be omitted until after Chapter 8. 
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Gest. y) = Gy V/1 — 72 (33) 
[standard error of a Y-score predicted from equation (31)] 


in which c, is the c of the У distribution, and r is the coefficient of 
correlation. The subscript “est.” is used to distinguish this standard 
error from the c of the distribution. 

From formula (31) we have caleulated the most probable height 
of a man weighing 160 pounds to be 69 inches. The reliability of this 
prediction is obtained by substituting ом) and т in formula (33) 
to find 


Севе, y) = 2.62 /1 — .60? = 2.1 inches 


We now say that the most probable height of a man weighing 160 
pounds is 69 inches with a Otest.) Of 2.1 inches; and that the chances 
are about two in three that our prediction does not miss the man's 
actual height by more than 25941 inches. We may feel quite certain 
that the estimated height of this man does not miss his true height 
by more than +86 est.) or by more than +6.3 inches (p. 185). 

The degree of accuracy with which X-scores can be predicted from 
(32) is given by the formula 


O (est. x) = Ory 1 — т? (34) 
[standard error of an X-score predicted from equation (32) | 


in which o, is the o of the X distribution, and r is the coefficient of 
correlation. 

We found on page 160 that the most probable weight of a man in 
our group who is 71 inches tall is 152.4 pounds. The O(est.) Of this 
prediction from (34) is 


Sest. x) = 15.541 — .60° = 12.4 pounds 


and the most probable weight of an 


: y man 71 inches tall, in our group 
or in the population from 


í which our sample was drawn, is 152.4 
pounds with a Gest.) Of 12.4 pounds. The chances, therefore, are 


about two in three that our prediction does not miss our man’s true 
weight by more than -+12.4 pounds. 


2. The accuracy of individual predictions from regression equations 


The formulas for c 


4 (st) Measure the error made іп taking pre- 
dicted, instead of actu 


al, X and Y measures, It r= 1.00, VI = 72 is 


чий 
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0, and Gest.) is zero—there is no error of estimate and each person's 
measurement is predicted exactly. On the other hand, when r = .00, 
vVI= r= 1.00, and the error of estimate is equal to the o of the 
distribution into which prediction is made. When this last situation 
occurs, the regression equation is of no value in enabling us the better 
to predict scores, as each person's most probable score (e.g., X) is 
simply the mean (i.e, Mx). When r = .00 all that we can say defi- 
nitely is that a subjects score lies somewhere in the distribution of 
Y's or X's. But just where we cannot tell, since our SE * of estimate 
equals the SD of the test. 

It is clear from formulas (33) and (34) that the accuracy of pre- 
dietion from a regression equation depends directly upon the c 
of the distribution (c, or о) and upon the degree of correlation 
between the two sets of measures. If the variability (cy) of Y is 
small, and the correlation between Y and X high (e.g., .90), values of 
Y can be predicted from known values of X with a comparatively 
high degree of accuracy. However, when the variability of a test is 
large, or the correlation low (or when both conditions exist), predie- 
tion from regression equations becomes so unreliable as to be almost 
valueless, Even when the correlation is fairly high, forecasts will 
often have an uncomfortably large error of estimate. Thus we have 
seen that in spite of the r = .60 between height and weight (Fig. 39), 
our forecast of a man’s weight, knowing his height, has a беч. х) of 
about 12 pounds (p. 162). Predicted heights will, in two-thirds of the 
cases, be in error by not more than 2 inches. An example in which 
high correlation offsets fairly large variability, permitting reasonably 
accurate forecasts, is given later in Figure 41, page 169. 

When an investigator uses the regression equations for purposes 
of prediction, he should always give the безі.) of his estimated scores. 
The value of a forecast depends, first of all, upon the size of the error 
of estimate; but it also depends upon the units of measurement, and 
upon the purposes for which the prediction is made (p. 186). 


3. The accuracy of group predictions 


We have seen that the standard error of a predicted score 
баз) may often be uncomfortably large. Only when r = 1.00 is 
Dd 00, and only then can an estimate be made without error. 
The correlation coefficient. must be .87 before /1— r* is .50, i.e., 
efore the standard error of estimate is reduced 50% ‘below the c of 
ж SE 


— standard error. 
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the test. Obviously, unless r is quite large (larger than we usually 
get in practice) the regression equation is of little aid in forecasting 
with reasonable accuracy what a given individual may be expected 
to do (p. 162). This fact has led many to discount unwisely the value 
of correlation in prediction and to conclude that the calculation of r 
is not worth the trouble. 

Fortunately correlation makes out better in forecasting the per- 
formance of groups than in predicting the most likely achievement 
of a given individual. In forecasting achievement the psychologist 
is in much the same position as the insurance statistician or actuary. 
The actuary cannot tell how long John Smith, aged twenty, will live. 
But from his tables, he can tell quite accurately how many of 10,000 
men now aged twenty will live to be thirty, forty, or fifty years old. 
In the same way, the psychologist may be quite uncertain concern- 
ing the performance of a given individual. But knowing the correla- 
tion between a test (or test battery) and some criterion of perform- 
ance, he can forecast, often with considerable accuracy, the probable 
performance of various groups chosen from his distribution of test 
scores. The degree of accuracy in such predictions depends upon the 
size of the correlation coefficient. 

To illustrate “actuarial” prediction in psychology, suppose that 
70% of a freshman class of 400 men achieve grades in their college 
work above the minimum passing mark and hence are regarded as 
“satisfactory” students. Suppose, further, that the correlation be- 
tween a standard intelligence test and freshman performance is .50. 
Now if we had selected the upper half of our group (i.e., the 200 stu- 
dents who performed best on the intelligence test) at the beginning 
of the term, how many of these 200 would have been “satisfactory,” 
i.e., in the upper 70% of the grades distribution? From Table 23 it 
can easily be read that 84% of our 200 selected freshmen (i.e., 168) 
should be found in the satisfactory group with respect to grades. 
The entry .84 is found in column .50 (percentage of test distribution 
chosen) opposite the correlation of .50. This result should be com- 
pared with the 70% (i.e., 140) who might be expected to fall in the 
satisfactory group when selection is by “guess,” without knowledge 
of the correlation. This entry is in column .50 opposite the r of .00. 

The probable performance of other and smaller groups chosen 
from our test distribution can be estimated with much greater 
accuracy from Table 23. We know, for example, that 91% of the 
best 20% of our students (roughly, seventy-three in the first eighty) 
can be expected to prove satisfactory in terms of our criterion (i.e., 
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TABLE 23%  Proportion of students considered satisfactory in terms of 
grades — .70 
Selection Ratio: Proportion Selected on Basis of Tests 


т 05 10 20 20 40 .50 .60 .70 .80 .90 .95 
00 79 Л0 70, 70 70 70 Л0 70 70 70 70 
05 Ма 208 00 792, ЛОР МІЗ MT ee le ele ONO) 
10 WT М6 dU 74 478 14139822! 22, ОТЧЕТТО 
15 S0 79, 27 240. 225.24 TB ТЕСТОТ Т] 
20 88 81779 78 ООП M8 ЯЛЫ 4 Seer 
25 887 88" Cai ТГ "787 О 2102578 72 METTI 
30 88 56 .84 42 .80 78 Л7 Л5 74 42 41 
185 01 50 50 5:2 82 50 Л8 6 45. (78 71 
40 087 ӨТ" 88 857 288 Е 797 9772.10: |19072 
45 4 195 0 .87 .85 83 .81 .78 176 178 .72 
150 96 .94 .91 .89 .87 .4 .82 .80 177 .74 .72 
-55 197 1066 .93 91 .88 .86 .83 .81 .78 .74 .72 
-60 18 797 95 02 0 .87 .85 .82 70 .75 .73 
-65 009 .98 06 .94 .92 .89 .86 .83 .80 .75 .78 
-70 | 100 .99 .97 .96 .93 .01 .88 .84 .80 .76 .78 
75 | 100 100 .98 .97 .95 .92 .89 .86 .81 .76 .78 
80 | 100 100 99 98 97 .91 .91 .87 .82 .77 .78 
85 | 100 100 100 99 .98 .96 .93 .89 .84 .77 .74 
-90 100 100 100 100 99 .98 .95 91 85 .78 14 
95 100 100 100 100 100 .99 .98 .94 .86 .78 .74 
100 | 100 100 100 100 1.00 1.00 1.00 1.00 .88 .78 .74 
UT) 3100: 1:00 ТООСОО GUT аланы ие ШЫНЫ 
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being located in the upper 70% of the grade distribution). Read the 
entry .91 in column 20 opposite т = .50. If the correlation of the 
Intelligence test and school grades had been .60 instead of .50, 8776 
(174 in 200) of the “best half" according to the test would have been 
Satisfactory students; and 95% of the “best” 20% on the test should 
be Satisfactory students. These forecasts are to be compared with 
70%, the estimate when r = .00. It is clear that a knowledge of the 
Correlation greatly improves the estimate, and the larger the 7 the 
better the forecast. 

Table 23 is a small part of a larger table in which “proportions 
Considered satisfactory in achievement” range from .05 to .95. The 
Correlation between test score and performance ranges from .00 to 
1.00. These tables are strictly accurate only when the distributions 
are normal both in the test and in the criterion of performance. They 
шау be used with considerable confidence when the distributions are 
Approximately normal, especially when the N’s are large; and in апу 
“аве they furnish useful information. 

* Taylor, Н. C, and Russell, J. T., “The Relationships of Validity Coeffi- 


cients t у à $ ion: Discussi d 
T. o the Practical Effectiveness of Tests in Selection: Discussion an 
ables,” Journal of Applied Psychology, 1939, 23, 565-578. 
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Forecasting tables have considerable value in selecting personnel 
for business or other vocations. First, we must determine what pro- 
portion of a given group of workers is to be considered “successful.” 
With this information in hand and knowing the correlation between 
our test battery and performance in the given activity, we may fore- 
cast the probable success of groups of new applicants from their test 
scores. Assume, for example, that 70% of a group of factory workers 
are regarded as “acceptable workers,” acceptability having been 
determined from ratings by foremen, number of pieces done in a 
given time, or time taken to complete certain standard jobs. Assume, 
further, that a test battery has a correlation of .45 with worker- 
performance. Then if we select the best twenty out of 100 applicants 
(“best” according to our tests), we find from Table 23 that 90% of 
this number or eighteen should be acceptable workers. If we had had 
no test and had simply selected the first twenty applicants to appear 
—or any twenty—70% or fourteen should be acceptable. Use of the 
tests improves our forecast 30%; and the more stringent the criterion 


of acceptability the greater the improvement in forecast made by the 
tests. 


Ill. The Effect of Variability of Scores upon the Size of r 


Suppose that the correlation between two tests in a group of 50 
sixth-grade children has been found to be .50. How will this correla- 
tion compare with that between the same tests in a group of greater 
range, e.g., a group of 200 children spread over grades 6, 7, and 8? 
More generally, knowing the correlation between two tests in a group 
of narrow range of talent, can we prediet the probable correlation in 
& group of wider range of talent? 

Тһе problem of the effect upon r of the “range of talent” (size of 
б and o,) within the group being studied often arises in correlational 
work. It becomes important, for example, when one wishes to go 
beyond the correlation obtained in the sample with which one is 
working and to generalize (estimate the r) for a group of wider 
Tange; or when 775 between the same tests obtained in different 
ranges are to be compared. A formula for estimating the correlation 
between two tests in a heterogeneous group when we know the cor- 
relation between the tests in a homogeneous group may be developed 
in the following way: Let O (est. y,) be the standard error of estimate 
1n à group somewhat curtailed in variability or in range of talent; 
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and Gest. y, be the standard error of estimate in a larger group less 
restricted in variability. (Y is the dependent variable, р. 156.) Then, 
on the assumption that our tests are as effective in the wide as in the 
narrow range, Gest, y, = Fest. y); OT, by formula (33), p. 162, 


5 Ж. = 
бу„\/1 — P anun T v V1 ШЕГІ 


бу, ыз їз ШЕЛ (35) 
бы VI = ru, 


(formula for estimating correlation in a wide range from а 
knowledge of the correlation in a narrow range) 


and 


in which су, is the standard deviation of Y in the group of curtailed 
Tange; o,,, is the standard deviation of У in the group of uncurtailed 
range; fe у, = the correlation in the curtailed group, and ту, = the 
Correlation in the uncurtailed group. 

To illustrate formula (35), suppose that in one group бу, = 10 
and rapu, 18.50. What would the r between the same two tests prob- 
ably be in a group in which су, = 15: in which бу, is 50% larger 
than о, ? Substituting oy, = 10, oy, = 15, and r4, = 50 in (35), 


we have . 
10 ER Ша Т ыға 


15 \1—25 
Squaring both sides of this equation, and solving, we have 
Ташу = 82. The т of .50 in the narrow range becomes an т of 82 in 
the wide range. It is clear from this example that direct comparison 
Of r’s is not valid when the variabilities (67%) within the groups from 
Which the т?з were computed are quite different. 

If X and not Y is the dependent variable, formula (35) becomes 


Ча MI — тив (36) 

e Oz, v1 = Yu 
At ormula for estimating correlation in a wide range 
a knowledge of the correlation in a narrow range) 


from 


Formulas (35) and (36) are open to the objection that each takes 
8ccount of only one distribution in estimating the probable increase 
in r with increase in range of talent. If, however, the increase 


m 6, as the group becomes more heterogeneous 1s accompanied by a 
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proportional increase in o, (or vice versa), formulas (35) and (36) 
will give accurate estimates. Experimental trial of these formulas 
has yielded results closely in accord with theoretical expectation.* 


IV. The Solution of a Second Correlation Problem 


The solution of a second correlation problem will be found in Fig- 
ure 41. The purpose of another “model” is to strengthen the reader’s 
grasp of correlational techniques by having him work straight 
through the process of calculating r and the regression equations 
upon a new set of data. A student often fails to relate the various 


_ aspects of a correlational problem when these are presented in piece- 
meal fashion. 


1. Calculation of r 


Our first problem in Figure 41 is to find the correlation between the 
LQ.'s achieved by 190 children of the same—or approximately the 
same—chronological age who have taken an intelligence examination 
upon two occasions separated by a six-month interval. The correla- 
tion table has been constructed from a scattergram, as described on 
page 129. The test given first is the X-variable, and the test given 
second is the Y-variable. The calculation of the two means, and of 
Cz, Су, Gz, and c, covers familiar ground, is given in detail on the 
chart, and need not be repeated here. 

The product-deviations in the Ez^y' column have been taken from 
column 100-104 (column containing the AM x) and from row 105-109 
(row containing the AMy). The entries in the Ұлу” column have 
been calculated by the shorter method described on page 137; that is, 
each cell entry in a given row has been multiplied first by its z-devia- 
tion (z') and the sum of these deviations entered in the column Sy’, 
The Ул” entries were then “weighted” once for all by the y' of the 
whole row. То illustrate, in the first row reading from left to right 

(1X 8) + (1X6) or 11 is the Ут entry. The 27% are 5 and 6, re- 
spectively, and may be read from the 2 row at the bottom of the 
correlation table. Since the common y’ is 5, the final Ez'y' entry is 
55. Again in the seventh row reading down from the top of the 
E н 2 —3) + (3X Т? + (7X -—1) # (16 X 0) + (2х1) 
Mathematical Bases (Now Works Miche B Анса] Procedures qua em 
з 
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+ (4X 2) or —18 makes up the Ez entry. The y’ of this row is —1, 
and the final z^y' entry is 18. To take still a third example, in the 
eleventh row from the top of the diagram, (1X —5) + (3 X =a 
+ (1X —3) + (2 X —2) or —24 is the Sa’ entry. The common y 
is —5 and the туу entry is 120. 

Three checks of the calculations (see p. 135), upon which 7, Os and 
бу are based, are given in Figure 41. Note that fx’ = Ул”; and that, 
when the ZXa'y"s are recalculated, at the bottom of the chart, 
fy’ = Ху, and the two determinations of Xz^y' are equal. When the 
Exz^y"s have been checked, the calculation of r by formula (22) is а 
matter of substitution. Note carefully that су, cy, Oz, бу are all left 
in units of class-interval in the formula for r (p. 139). 

% 


2. Calculation of the regression equations and the SE's of estimate 


The regression equations in deviation form are given on the chart 
and the two lines which these equations represent have been plotted 
on the diagram. Note that these equations may be plotted as they 
stand, since the class-interval is the same for X and Y (p. 158). In 
the routine solution of a correlational problem it is not strictly neces- 
sary to plot the regression lines on the chart. These lines are often 
of value, however, in indicating whether the means of the X- and 
Y-arrays can be represented by straight lines, that is, whether regres- 
sion is linear. If the relationship between X and Y is not linear, 
other methods of calculating the correlation must be employed 
(p. 371). 

Тһе standard errors of estimate, shown in Figure 41, are 7.83 and 
8.55, depending upon whether the prediction is of Y from X or X 
from Y. АП I.Q?s predicted on the Y-test from X may be considered 
to have the same error of estimate,* and similarly for all predictions 
of X from Y. 

Errors of estimate are most often used to give the reliability of 
specifie predicted measures. But they also have a more general inter- 
pretation. Thus a оок. y) of 7.83 points means that 68% of the I.Q.'s 
predicted on test, Y from test X may be expected to differ from their 
actual values by not more than +7.83 points, while the remaining 


3276 may be expected to differ from their actual values by more than 
2Е7.88 points. 


* See, however, Terman, L. М. and Merrill, M \ i 7 
(Boston: Houghton Mifflin Co. 1937), pp. diay тін di ie 
mate have been computed for various IQ. levels. 
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3. The “regression effect" in prediction 


Predicted scores tend to “move in" toward the mean of the dis- 
tribution into which prediction is made (p. 154). This so-called 
regression effect has often been noted by investigators and is always 
present when correlation is less than +1.00.* The regression phe- 
nomenon can be clearly seen in the following illustrations: From the 
regression equation Y = 69X + 32.6 (Fig. 41) it is clear that a child 
who earns ап 1.0. of 130 on the first test (X) will most probably 
earn an І.0. of 122 on the second test (Y); while a child who earns 
an LQ. of 120 in X will most probably score 115 in Y. In both of 
these illustrations the predicted Y-test 1.0. is lower than the first 
or X-test 1.0. Put differently, the second I.Q. has regressed or moved 
down toward the mean of test Y, i.e., toward 102.7 The opposite effect 
occurs when the I.Q. on the X-test is below its mean: the tendency 
now is for the predicted score in Y to move up toward its mean. 
Thus from the equation Y = .69Х + 32.6, we find that if a child 
earns ап I.Q. of 70 on the X-test his most likely score on the second 
test (Y) is 81; while an 1.0. of 80 on the first test forecasts an 1.0. of 
88 on the second. Both of these predicted LQ.s have moved up 
nearer to the mean 102.7 (i.e., Му). у қ 

The tendency for all scores predicted from a regression equation 
to pull in—down or up—toward the mean can be seen as a general 
Phenomenon if the regression equation is written in standard-score 
form. Given 

6; 


zr: 
vero 


(29) p. 154 


if we divide both sides of this equation by c, and write c, under 2, 


We have 
PARTEA or Zy = 72: (37) 
Oy 5 
(regression equation when scores in X. and Y are expressed 
as z or o-scores) 


In the problem in Figure 41,2, = .762.. If 2: is +1.00o, or +2.000, 
Or --3.006 from Му, Zy will be +.766, +1.520, or 352280 from My. 
That is to say, any score above or below the mean of X forecasts a 

-score somewhat closer to the mean of Y. 

* Thorndike, R. L, “Regression Fallacies in the Matched Groups Experi- 
ment,” Psychometrika, 1942, 7, 85-102. 
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In studying the relation of height in parent and offspring, Galton 
(p. 154) interpreted the phenomenon of regression to the mean to be 
a provision of nature designed to protect the race from extremes. 
This same effect occurs, however, in any correlation table in which r 
is less than +1.00, and need not be explained in biological terms. 
The L.Q.'s of a group of very bright children, for instance, will tend 
upon retest to move downward toward 100, the mean of the group; 
while the 1.Q.’s of a group of dull children will tend upon retest to 
move upward toward 100. 


V. The Interpretation of the Coefficient of Correlation 


When should. a coefficient of correlation be called "high," when 
"medium," and when “low”? Does an r of .40 between two tests 
indicate “marked” or “low” relationship? How high should an r be 
in order to permit accurate prediction from one variable to another? 
Can an r of .50, say, be interpreted with respect to "overlap" of 
determining factors in the two variables correlated? Questions like 
these, all of which are concerned with the significance or meaning of 
the relationship expressed by a correlation coefficient constantly arise 
in problems involving mental measurement, and their implications 
must be understood before we сап effectively employ the correla- 
tional method. 8 

Тһе value of r as a measure of correspondence may be profitably 
considered from two points of view. In the first place, r’s are com- 
puted in order to determine whether there is amy correlation (over 
and above chance) between two variables; and in the second place, 
"8 are computed in order to determine the degree or closeness of 
relationship when some association is known, or is assumed, to exist. 
The question, “Is there any correlation between brain weight and 
intelligence?”, voices the first objective, And the question, “How 
significant is the correlation between high-school grades and first- 
year performance in college?", expresses the second. The problem 
of when an obtained r denotes significant relationship will be con- 
sidered later, on page 197. This section is concerned mainly with 
the second problem, namely, the evaluation—with respect to degree 
of relationship—of an obtained coefficient. The questions at the 
beginning of the paragraph above all bear upon this topic. 
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1. The interpretation of r in terms of verbal description 


It is customary in mental measurement to describe the correlation 
between two tests in a general way as high, marked or substantial, 
low or negligible. While the descriptive label applied will vary some- 
what in meaning with the author using it, there is fairly good agree- 
ment among workers with psychological and educational tests that an 


"Ітоп .00to+ 20 denotes indifferent or negligible relationship; 
r from + .20 {о + .40 denotes low correlation; present but slight; 
r from + 40 to + .70 denotes substantial or marked relationship; 
r from + .70 to + 1.00 denotes high to very high relationship. | ^ 


"This classification is broad and somewhat tentative, and can only 
be accepted as a general guide with certain reservations. Thus a 
Coefficient of correlation must always be judged with regard to 


(1) the nature of variables with which we are dealing; 
(2) the significance of the coefficient; 

(3) the size and variability of the group (p. 166) ; 

(4) the reliability coefficients of the tests used (p. 842); 
(5) the purpose for which the r was computed. 


To consider, first, the matter of the variables being correlated, an т 
of .30 between height and intelligence, or between head measurements 
and mechanical ability would be regarded as important although it 
18 rather low, since correlations between physical and mental func- 
tions are usually much lower—often zero. On the other hand, the 
Correlation must be .70 or more between measures of general intelli- 
Sence and school grades or between achievement in English and in 
history to be considered high, since 7’s in this field usually run from 
-40 to .60. Resemblances of parents and offspring with respect to 
Physical and mental traits are expressed by r's of .35 to 55; and, 
accordingly, ап r of .60 would be high." By contrast, the reliability 
of a standard intelligence test is ordinarily much higher than .60, and 
the self-correlation of such a test must be .85 to .95 to be regarded 


` as high. In the field of vocational testing, the r's between test bat- 


teries and measures of aptitude represented by various criteria rarely 
Tise above .50; and r's above this figure would be considered excep- 
tionally promising. 


* Jones, Н E irst Study of Parent-Child Resemblance in Intelligence, 
27th Yearbook of the NSS E. 19, Part I, 61-72. 
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Correlation coefficients must be evaluated also with due regard to 
the reliabilities (p. 332) of the two tests concerned. Because of 
chance errors, an obtained r is always less than its corrected" value 
(p. 346) and hence, in a sense, is a minimum measure of the relation- 
ship present. The effect upon an r of the size and variability of the 
group is discussed elsewhere (p. 167), and a formula for estimating 
such effect provided. The purpose for which the correlation has been 
computed is important.* Ther which is to be employed in predicting 
the scores of individuals from one test to another, for instance, should 
be much higher than the т the purpose of which is to provide fore- 
casts of the achievement of selected groups (p. 344). 

In summary, a correlation coefficient is always to be judged with 
reference to the cireumstances under which it was obtained. There 
is no such thing as the correlation between mechanical aptitude and 
abstract intelligence, for instance, but only a correlation between 
certain tests of mechanical aptitude and intelligence given to certain 
groups under definite conditions. Correlation coefficients are always 


` {о be thought of as relative and never as absolute indices of 
relationship. 


2..Тһе interpretation of r in terms of Gest.) and the coefficient of ‘alien- 
ation 


One of the most practical ways of evaluating the effectiveness of a 
coefficient of correlation is through the standard error of esti- 
mate, Gest). We have found (p. 161) that бом. yy—which equals 
6, V/1 — r?—enables us to tell how accurately we can estimate (by 
means of the regression equation) an individual's score in Test Y 
when we know his score is Test X. The size of Gest, y) depends 
directly upon c, and upon the correlation between the two tests. 
When r = 1.00, (est. y) = .00, and we can predict a person's score 
in Y, knowing his score in X, with 100% accuracy—no error. On the 
other hand, when r = .00, Giese. y, = всу, and we сап only be certain 
that the predicted score lies somewhere within the limits of the 
Y-distribution, 1.6., within the limits Mean Score + ӛс,. In other 
words, when r = .00 our estimate of a person's Y-score is not aided 
at all by a knowledge of his score in X. As r decreases from 1.00 to 
00, the standard error of estimate increases so markedly that pre- 
dictions from the regression equation range all the way from cer- 


artle, C. L., Occupational Counseling- Techniques, 


*Stead. W. H., and Sh 
ор. cit., Chapters 7 and 8 
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tainty to what is virtually a “guess.” * The significance of ап r, with 
respect to predictive value, therefore, may be accurately gauged by 
the extent to which r improves our prediction over a “mere guess.” 

_ The following problem will serve as an illustration: Suppose that 
the correlation between two tests Y and X is .60, and that о, = 5.00. 
Then Gest, уу is 5 X VI — .60° or 4.00. This SE is 20% less than 5.00, 
the dest. уу When r = .00, i.e., when Giest. y; has minimum predictive 
value. The amount of reduction in Gest. y) as 7 varies from .00 to 
1.00 is given by the expression \/ 1— 77, and hence it is possible from 
МТ 72 alone to gauge the predictive value of ап т. The expression 
VI—7" is often called the coefficient of alienation and is denoted by 
the letter К. The coefficient of alienation may be thought of as meas- 
uring the absence of relationship between two variables X and Y in 
the same sense in which r measures the presence of relationship. 
When k = 1.00, r = .00, and when k = .00, т = 1.00: the larger the 
Coefficient of alienation the smaller the degree of relationship, and 
the less precise the prediction from X to Y. In order to show how the 
estimate improves as r increases, the k’s for certain values of r from 
00 to 1.00 are tabulated in Table 24. 


TABLE 24 Coefficients of allsnafion (0) for values of r from .00 to 1.00 


r k=vI-7 r к-УІ-а 
1000 j (50%) 
-1000 "8660 : 

-2000 5050 s ) 4359 
-3000 9539 19500 3122 
-4000 9165 .9800 1990 
-5000 8660 -9900 1411 
6000 (8000 1.0000 

:7000 "7141 

(7071) (7071) 


Note that r must be 866 before k lies halfway between 1.00 and 
:00, before the standard error of estimate is reduced to one-half of 
its value where т = .00. For 7% of .80 or less, the coefficients of 
alienation are clearly so large that predictions of individual scores 
based upon the regression equation are little better than “guesses.” f 


*Th ; t imply an estimate which is based 
е term “guess” as here used does not 1 ply gea E т 


upon no informati rer—a shot in 

es ti hatsoever—a sho k, so, I үүле 

Т = .00, the iege un Y-score predicted for every individual in the X-dis- 

tribution is My, and болс. i = бу. Hence, our Y-estimates эе guesses în the 
Й : 5 Н Pu any 

ACEH үші they may lie anywhere in the Y-distribution—but not anywhere 

probable success of a group (see 


eM is more efficient in forecasting the 
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Even when r = .99, the standard error of estimate is still 1/7 as large 
as when т = .00. In contrast to actuarial prediction, therefore, the 
estimation of an individual’s score in one test from his score in 
another is not often warranted unless r is at least .90. 

The coefficient E given by the formula below is often useful in 
providing a quick estimate of the predictive efficiency of an obtained 
r. E, which is called the “coefficient of forecasting efficiency” or the 
coefficient of dependability, is derived from / as follows: 


BE=1-vV1—7F (38) 
or 
E=1-k 


(“coefficient of forecasting efficiency” or coefficient 
of dependability) * 


To illustrate the application of Е, suppose that the correlation of а 
test (or of a test battery) with some criterion of performance is .50. 
From formula (38) E = 1 — 487 or 13; and the test’s efficiency in 
predicting criterion scores may be put at 13%. When r=.90, 
E = .56 and the test is 56% efficient; when т = .98, E = .80 and the 
tests is 80% efficient, and so on. Obviously, the correlation must be 
above .87 for the test’s forecasting efficiency to be greater than 50%. 

Е gives essentially the same information as Сезе, y) Or k. Thus, if 
r= .50, k = .87 and ce y) is 87% of o,, which is its value when 
т = .00. Accordingly, an r of .50 reduces the сү. у by 13%. 


3. The interpretation of г in terms of the coefficient of determination (г?) 


The interpretation of т in terms of “overlapping” factors in the 
tests being correlated may be generalized through an analysis of the 
variance (c?) of the dependent variable—usually the Y test. In 
studying the variability among individuals upon a given test, the 
variance of the test scores is often a more useful measure of “spread” 
than is the standard deviation. The object in analyzing the variance 
of Test Y is to determine from the correlation between Y and X what 
part of Test Y's variance is associated with, or dependent upon, the 
variance of Test X, and what part is determined by the variance of 
factors not in Test X. 

When we have computed the correlation between Tests X and Y, 


* See Conrad, H. S., and Martin, G. B., “ 
po of a ‘True’ Criterion,” Jour e Index of Foreca 


0 sting Efficiency, 
rnal of Experimental Education, 1935, 


ж” 
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62, provides a measure of the total variance of the Y-scores; and 
Ое. y) Which equals o?,(1 — т?ш) gives a measure of the variance 
left in Test Y when that part of the variance caused by Test X has 
been ruled out or made constant. Instead of оез. у) the designation 
Gy.» is often used to denote that variability in X— insofar as it affects 
Y—is ruled out. What is meant by the term “Х constant” or “ruled 
out” may be seen in Figure 39 where the variability within any 
column (4140-149,” for instance) is given by oy/1— rA, X has a 
constant value for each column (X = 144.5 in column 140-149, for 
example) and accordingly Oy.» becomes а measure of the variability 
of Y for a constant X. In Figure 39, бу.г is 2.10 in the column 140- 
149 as compared with a "total" су of 2.62. 

The relationship between c, and бу. may be seen in the following 
illustration. If we have the correlation between height and weight 
in a group of school children, 67,4 Will be reduced to о? When the 
variance in weight is zero—when all of the children in the group have 
the same weight. If 67,2 is subtracted from 0°, there remains that 
part of the variance of Test-¥ which is associated with X; and if this 
is divided by o?, we obtain that fraction of the variance of Y attrib- 
utable to or associated with X. Carrying out these operations, 
we have 

вл Ора. 02, — oy F ШИШ = rey 
07, 6, 

from which it is clear that т?л, gives the proportion of the variance of 
Test Y which is associated with Test X. When used in this way, т 
is called the coefficient of determination. If the correlation UE 
Tests Y and X is .707, 7? is .50. Hence, an 7 of .707 means that 50% 
of the variance of Test Y is associated with the variability in Test x 3 
Since 72 -+ k? = 1.00, ће proportion of the variance in Test Y which 
is not associated with Test X is given by k?. In the present case, 
Since 72 is 50,12 is also .50. | 

The coefficient of determination tells us what part of the variance 
of Test Y is determined by Test X. But т alone gives us по informa- 
tion as to the character of the association and we cannot assume à 
Causal relationship unless we have evidence beyond the correlation. 

nspection of the squares of small coefficients of correlation empha- 
Sizes the slight degree of association, in terms of related changes in 
Variability, indicated by low ға, An r of .10, for example, or .20, or 
80, between Tests X and Y, indicates that only 1%, 4%, and 9%, 
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respectively, of the variance of Y is associated with X. On the other 
hand, when r is .95, about 90% (r? — .90) of the variance of Test Y 
is associated with Test X, only 10% being unrelated. Valuable 
insight into the part played by one or more variables in determining 
the total variance of a criterion may be obtained through the coeffi- 
cient of determination. 


4. Summary 


It may be helpful to summarize the main points brought out in 
this section. 


(1) Whether an obtained r is to be regarded as “high,” “medium,” 
or “low” will depend upon the variables being studied, the re- 
liability coefficients of the two tests, the size of the group and its 
variability, and the purpose for which the r is being computed. 
Correlation coefficients are never absolute indices of relation- 
ship. 


(2 


5 


The accuracy with which an ғ enables us to predict (through 
the regression equation) individual scores in Test Y from given 
Scores in Test X may be determined from (оо. y), from E, and 
from k, the coefficient of alienation. 


The coefficient of determination provides a method of determin- 
ing what proportion of the total variance (o?) of Test Y is asso- 
ciated with Test X; and what proportion is independent of Test 
X. This method of analysis may be extended to problems em- 
ploying partial and multiple correlation (р. 396). 


(8 


= 


PROBLEMS 


1. Write out the Tegression equations in score form for the correlation 
table in example 3, page 149. 
(а) Compute 6 est, уу and 6 eat. кх). 
(6) What is the most probable height of a boy who weighs 30 pounds? 
45 pounds? What is the most probable weight of a boy who is 36 
inches tall? 40 inches tall? 
2. In example 4, page 149, find the most probable grade made by a child 
whose score on Army Alpha is 120. What is the O(est.) Of this grade? 
3. What is the most probable algebra grade of a child whose 1.0. is 100 
(data from example 5, p. 150) ? What is the O (est.) Of this grade? 


4 


M 


у. 


. Plot the regression lines in оп the co 


- In a group of 115 freshm 


. Show the regression effect in example 4, page 


- Basing your answer upon your expe 
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. Given the following data for two tests: 


History (X) English (Y) 

Mx = 75.00 My = 70.00 

0, = 600 бу - 8.00 
ту = -12 


(a) Work out the regression equations in score form. 

(b) Predict the probable grade in English of a student whose history 
mark is 65. Find the Gest.) of this prediction. 

(c) If r,, had been .84 (o's and means remaining the same) how much 


would Gest, y) be reduced? 


. "The correlation of a test battery with worker efficiency in a large fac- 


tory is .40, and 70% of the workers are regarded as “satisfactory.” 
select the best twenty-five 


(a) From seventy-five applicants you 
f these should be satisfactory 


in terms of test score. How many 0 
workers? 

(5) How many of the best ten should be satisfactory? 

(c) How many in the two groups should be satisfactory if selected at 
random, i.e., without using the test battery? 

relation diagram given іп exam- 


ple 5, page 150. Calculate the means of the Y-arrays (successive Y-col- 
umns), plot as points on the diagram, and join these points with straight 
lines, Plot, also, the means of the X-arrays and join them with straight 
lines. Compare these two “}ines-through-means” with the two fitted 
regression lines (see Fig. 39, р. 153). 

en, the r between reaction time to light and 
{ the reaction times is 20 ms. What 


substitution learning is .30. The с 0 
between these two tests to be in a 


would you estimate the correlation 
group in which the o of the reaction times is 25 ms.? 

149, by calculating the 
For 19-5 +1.000 and 


regression equation in standard-score form. 
chool marks in 


--9.00g from the mean I. Q, find the corresponding 8 
Standard-score form. 

rience and general knowledge of 
tion between the following pairs 


psychology, decide whether the correla 
negative; (2) high, 


of variables is most probably (1) positive or 
medium, or low. 

(a) Intelligence of husbands and wives. 

(6) Brain weight and intelligence. 

(c) High-school grades in history and physies. 
(d) Age and radicalism. 

(е) Extroversion and college grades. 
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д. hi 
10. How much more will an r of .80 reduce a given Gest.) than an 7 of .40? 
An r of .90 than an т of .40? 
11. (a) Determine k and E for the following 75: .35; —.50; .70; .95. 
Interpret your results. 
(b) What is the "forecasting efficiency” of an г of 45? an r of .99? 
2. The correlation of a criterion with a test battery is .75. What percent 
of the variance of the criterion is associated with variability in the bat- 
tery? What percent is independent of the battery? 


ANSWERS 
1. Y= 40X-L2412; Y — 126Y — 11.52 
(а) Gest. y) = 178; C (est, x) = 3.16 


(b) 36.12 inches; 42.12 inches; 33.84 pounds; 38.88 pounds 

2: 852; cia. y = 7.0 

8. Х- 37Y 4-816. When Y(LQ.) is 100, X (algebra) is 45.2 c (est, x) 
= 6.8 

4. (a) Y = 96x — 2;Х= ‚54Ү + 37.2 
(b) 60.4; O (est, y) = 5.5 ы 
(с) 22% af 


"67 
5. (а) 21 7. r= 65 | 
(b) 9 8. +.46 and +.92 4 
(с) 17.5 ап4 7 (i.e., 70%) / 
10. Five times as much; seven times as much. : 
ll. (a) T k E 
35 94 06 
—.50 87 13 A 
70 71 29 “ж 
95 31 69 i е - 
(b) 11%; 86% РР“ 


12. 56%; 44% 


2) 


THE RELIABILITY OF THE MEAN 
AND OF OTHER STATISTICS 


+ 


I. The Meaning of Reliability 


The true mean or the true o of any set of measurements (of height, 
mechanical aptitude or intelligence, for example) is that hypothe- 
tical value obtained by taking into account the scores made by all of 
the members of some defined group called the population. Since it is 
rarely if ever possible to measure all of the members of a population, 
we must usually be content with “samples”; and owing to slight 
differences in the composition of these samples, computed means and 
os may be somewhat larger or somewhat smaller than their true 
values. Population measures are called parameters, and are to be 
thought of as fixed reference points. Measurements obtained from 
Samples are called statistics. Statistics are always estimates of their 
parameters; and the accuracy of the estimate is a measure of the 
reliability of the statistic. 

While we cannot determine the parameters themselves, we can 
estimate them by computing the amount by which our statistics 
Probably diverge from these parameters. This amount, which may 
be large or small, serves as an index of tlie dependability or trust- 
Worthiness of the statistic. Whenever we have calculated a statistic, 
therefore, we should ask ourselves this question: “How accurate an 
estimate is this statistic (mean or SD, say) of the parameter which I 
Would get by taking into account the entire population from which 
this sample was drawn?” The purpose of this chapter is to outline 
Methods which will enable us to answer this question. The reliability 
of the mean and the median will first be considered; following this 
the reliability of the с and © and of certain other useful statistics. 
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Il. The Reliability of the Mean and of the Median 
1. The reliability of the mean 


(1) THE STANDARD ERROR (SE) OF THE MEAN (см) 


What is meant by the reliability of the mean can best be under- 
stood by examining the factors upon which the stability of this 
measure depends. Suppose that we wish to know the mean ability of 
college freshmen in the United States as shown by their scores upon 
the American Council Psychological Examination. To measure the 
achievement of college freshmen in general would require in strict 
logic that we test all of the freshmen in the United States. But this 
is obviously a stupendous if not an impossible task, and we must 
perforce be satisfied with taking the records of a sample as large and 
as randomly drawn as possible. The definition of a random sample 
is given on page 202. Suffice it to say here that we cannot use fresh- 
men from only a single institution or from only one section of the 
country; and that we must guard against selecting only those with 
high, or only those with low, scholastic records. The more successful 
we are in getting an “unselected” group, the more nearly representa- 
tive this group will be of all freshmen in the country. Evidently, 
therefore, the reliability of a mean depends for one thing upon how 
impartially we have chosen our sample. 

Given an adequate sample, the reliability of a mean can be shown 
to depend mathematically upon two characteristics of the distribu- 
tion: (1) the number of cases and (2) the variability or spread of 
the measures. The formula for the standard error of the mean is 


" ВЕ „елп ОГ бу = чы ^. (39) 


VA 
(standard error of the arithmetic mean) 


where c — standard deviation of the population and 
N — number of cases in the sample. 


In this formula for the SE of the mean 
really the population and not the sample standard deviation. As we 
rarely have the population c we must of necessity use an estimate 
of it (p. 190), and our best estimate is the SD of the sample. Modern 


the c in the numerator is 


ү" 
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Writers on statistics often make a distinction between the standard 
deviation of the population and the standard deviation of a sample 
drawn from this population, designating the population SD by o and 
the sample SD by s. While this distinction is helpful, c as a symbol 
for the standard deviation of a sample has been so widely used in 
the psychological literature (o-scaling, c-units, and the like) that 
in this chapter we shall continue to use only c (or SD) and not s. 
We shall designate the standard deviation as population c or sample 
с when the meaning is not evident from the context. 

It is clear that the number of cases influences the mean, since the 
Addition of even one extra measure to a series will change the mean 
unless the additional case happens to coincide with the mean exactly. 
Moreover, the addition of 1 score to a set of 10 scores will effect a 
greater change in the obtained mean than the addition of 1 score to a 
set of 1000 scores, as each case counts for less in the larger group. It 
has been shown mathematically that the reliability of a sample mean 
increases, not in proportion to the number of scores upon which it is 
based, but in proportion to the square root of the number of scores. 
The mean obtained from 25 scores is not 25 times, but \/25 or 5 
times, as reliable as a single score. And a mean based upon an N of 
36 is not 4 times as reliable as a mean based upon an V of 9, but only 
2 times as reliable—since \/36 divided by \/9 equals 2. rh 

The reliability of a mean depends also upon the variability of the 
Separate measures around the obtained mean. If the c of the sample 
is large, we are unable to say where the means of other samples 
which we have not drawn will most probably fall—whether they will 
be close to, or far from, the given obtained mean. On the other hand, 
if the с is small, we may be fairly certain that other sample means 
Will fall reasonably close to the mean of our sample. The reliability 
of a sample mean, therefore, will vary with the size of the о; as с 
Increases, reliability decreases. . ' 

Тһе SE of the mean is an important and much-used formula. It 
Measures the extent to which this statistic is affected by (a) errors of 
Measurement as well as by (b) sampling errors—differences occa- 
sioned by fluctuation from sample to sample. A decrease in о or an 
Inerease in N will cause the standard error to become smaller numer- 
ically, А decrease in би means that the amount by which the 
Obtained mean probably misses the mean of the population is just so 
much less, In short, the reliability of a sample mean increases as 


би decreases, 
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(2) APPLICATION AND USE OF THE ЗЕ ОЕ THE MEAN 


A problem will serve to illustrate the use and interpretation of the 
SE of the mean. 


Example (1) In 1883, the Anthropometric Committee of the 
British Association found the mean height of 8555 adult males in 
the British Isles to be 67.46 inches, with a SD of 2.57 inches. How 
reliable is this measure of mean height? Specifically, how much 
does it probably diverge from the true mean (parameter) which 
might have been obtained had all adult males in the British Isles 
been measured? 


We cannot answer this question precisely as the value of the true 
mean is, of course, unknown. But we can give an estimate of re- 
liability in terms of the probable divergence of our mean from the 
TM (true mean). Applying formula (39), we find the SH, to be 


2.57 
\/8585 
This SH * approximates to the SD of a distribution of means which 


like our mean of 67.46 inches are all derived from samples drawn 
from the common population. The normal curve in Figure 42 repre- 


= .028 inch 


oy = 


— 0.084 -0.056 -0.028 ТМ 0.028 0.056 0.084 


FIG. 42 Sampling distribution of means showing variability of obtained 
means around the true or population mean (TM) in terms of оу (.028) 


* Our c of 2.57 inches is our best estimate of tl i 
О Б the ро; 
and hence is used as our closest approximation to i aces ri 


. 9t fairly close to it, more often than they 
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sents this distribution of sample means: it is centered at the hypo- 
thetical true mean (TM) and its SD (oy) is .028 inch. Sample means 
fall equally often on the plus and minus sides of the TM. About two- 
thirds of such means (actually 68.26%) lie within +1.0o, of the 
TM, that is, within a range of + .028 inch. Also, 95 out of 100 
sample means lie within + 2.0 oy (more exactly = 1.96 см) of the 
true mean; and accordingly miss the TM by not more than + .055 
inch (= 1.96 X .028). 

Our mean of 67.46 inches is, of course, one of the sample means 
represented in the sampling distribution of Figure 42. Hence the 
probability is high (P —.95) that 67.46 inches (or any sample 
mean) does not miss the population mean (the parameter) by more 
than = 055 inch. And conversely, the probability is .05 (one chance 
in 20) that 67.46 inches does miss the Т.М by more than = .055 inch. 
Both of these statements are estimates of the reliability of our 
sample mean in terms of its probable divergence from the population 
mean, Deviations from the TM which are less likely of occurrence 
than those listed above may be computed by taking into account 
more of the sampling distribution of means in Figure 42. 


Discussion 


How the standard error measures the reliaoility or stability of an obtained 
mean may be more clearly shown perhaps in the following way: Suppose 
that we have caleulated the mean height of each of 100 groups of men; that 
each group contains 8585 subjects; and that the groups ог samples are 
drawn at random from the general population. Тһе 100 means obtained 
from these samples will tend to differ slightly from one another owing to 


"errors of sampling," or sampling fluctuations. Hence, not all samples will 
Tepresent with equal fidelity the population from which they have been 


drawn. It can be shown mathematically that the frequency distribution of 
these sample means will fall into a normal distribution around the “true 

ог population mean as their measure of central tendency. Even when the 
Samples are themselves skewed, the means from such samples will be nor- 
mally distributed. This “sampling distribution” of means measures the 


€rrors of sampling or fluctuations in mean values from sample to sample. 
In this hypothetical normal distribution of means we find relatively few 
small minus, and zero 


large plus or minus deviations; and many small plus, 
deviations. In short, the obtained means will hit very near to the true mean, 
will miss it by large amounts. 
Тһе mean of our distribution of 100 means is our best estimate of „the 
“true” or population mean. And our best estimate of the o of this distribu- 
tion of means is the standard error of the mean which we have calculated. 
In other words, gj, measures the spread of sample means around the true 
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or population mean. It is because of this fact that the standard error of the . 


mean becomes a measure of the amount by which any sample mean prob- 
ably diverges from the population mean. 

The results of our hypothetical experiment are represented graphically 
in Figure 42, page 184. The 100 sample means are represented by a normal 
frequency distribution around the TM (true mean) and oy is put equal to 
028. The heights of the different ordinates (y’s) represent the frequency 
‘of the various sample means. The с of a normal distribution when meas- 
ured off in the plus and minus directions from the mean includes the 
middle 68.26% of the cases. About 68 of our 100 obtained means, therefore, 
may be expected to miss the TM by not more than +10у (2.028 inch); 
and about 95 of our obtained means may be expected to miss the ТМ by 
not more than +20, (-Е.056 inch). Since our mean of 67.46 inches is one 
of these sample means the probability is approximately .95 that 67.46 inches 
does not miss the true mean by more than -+.056 inch. 


(3) DEFINING RELIABILITY IN TERMS OF LEVELS OF CONFIDENCE 


The definition of reliability in terms of the "probable divergence 
of statistic from parameter" is straightforward and reasonable as it 
is evident that confidence can be placed in an obtained mean if there 
is small likelihood of its having missed its true value by a large 
amount. An obvious difficulty with probability statements concern- 
ing reliability, however, arises from our inability to say how far the 
sample mean must miss the T'M before the expected deviation is to 
be Judged “large.” The sampling error allowable in a mean will 
always depend upon the purpose of the experiment, the standards of 
accuracy demanded, the units of measurement employed and other 
factors.* An experimenter can never say categorically that a com- 
puted mean is—or is not—Treliable, as reliability is a relative, not an 
absolute, concept. But he can set up accuracy limits which will mark 
off for a given degree of probability the deviation of computed mean 
from TM. Degree of confidence in the stability of a given statistic 
will then depend upon the accuracy limits imposed. 

Two sets of accuracy limits are in general use and have been 
accepted as standard by most investigators. These limits define 
what are called the .05 and .01 levels of confidence. How level of 
confidence in a mean or other statistic is dependent upon the accu- 
racy limits chosen may be shown in the following way. The sam- 

ling distribution of a mean computed fr i 
Pele will be normal or cea g^ fairly large random 

DARET nearly normal (see Fig. 42). In a normal 
distribution, 95% of the cases (Table A) fall between -- 1.96 би 80 
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that the odds are 19:1 that amy sample mean will lie within these 
limits. Furthermore, 99% of the cases in a normal distribution fall 
between = 2.58 бу and the odds аге 99:1 that any sample mean will 
lie within these limits. Conversely, 596 of the means can be expected 
to lie outside the limits + 1.96 ву and 1% outside of the limits 
+ 2.58 ox. 

These two intervals (+ 1.96 oy and + 2.58 cy) constitute, then, 
ranges or accuracy limits within which, for a known probability, our 
sample mean will fall. Our faith in these limits is expressed by say- 
ing that we may be “confident at the .05 level" that our M lies in the 
range ТМ + 1.96 ox; and “confident at the .01 level" that our mean 
lies in the range TM = 2.58 оу. We can expect to be wrong 5% of 
the time if we take the .05 level and 1% of the time if we take the 
101 level. These levels .05 and .01 reflect degrees of assurance, there- 
fore, the .01 level deserving greater respect than the .05. 

We may illustrate the concept of confidence levels by reference to 
Example (1), р. 184. Taking the + 2.58 oy, accuracy limits, we may 
be confident at the .01 level that 67.46 inches does not deviate from 
the TM by more than + .07 inch (+ 2.58 X .028). The expectation 
of an error of = .07 inch or more in our sample mean is expressed 
by a probability of .01. It is extremely doubtful whether our measur- 
ing instrument for height could detect an error of the order 4 07 
inch. Therefore, an experimenter would be clearly justified in taking 
the sample mean of 67.46 inches (with an SE of .028 inch) as highly 
Stable and deserving of great confidence. 


(4) ESTABLISHING CONFIDENCE-INTERVALS FOR THE TM 


So far we have discussed reliability in terms of the probable 
divergence of sample mean from ТМ. Another approach to the 
Problem of how best to describe the reliability of a statistic is 
through the setting up of limits which for а given level of confidence 
will embrace the ТІМ. Such limits are said to define confidence- 
intervals, The method of establishing such intervals is as follows. 
It is clear from Figure 43 that in our sampling distribution of mean 
heights, ТМ = 30, provides reasonable limits within which nearly 
all (actually 99.97%) of our sample means can be expected to lie. 
Since the TM itself is unknown, all that we can infer with respect to 
this Parameter is that it could take a range of values—one of which 
E the given sample mean. Suppose we take +30, as a fairly inclu- 
Sive Working range (Fig. 43). Then if our M falls at the tentative 
Upper limit of the sampling distribution, TM = M —3 oy; while if 
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-3¢ +30 
TM M 
TM-M-*30, TM-M-30y 


FIG. 43 When M falls at +-36y,, TM = M — 3с; 
when М falls at -3су, TM -М--3су 


M falls at the tentative lower limit of the sampling distribution, 
TM = M --36y. These relations are shown graphically in Figure 
43. Since + 3 o; in a normal distribution include 99.97% of the 
cases, the limits specified by M = 3 oy are said to define the 99.97% 


confidence-interval. Evidently, we may feel confident to a degree 


approaching certainty that the TM lies within this range. 

Intervals which portray other degrees of confidence can be set up 
in the same way. We know that 9596 of the cases in a normal dis- 
tribution fall within the limits + 1.96 оу and that 99% fall within 
the limits = 2.58 ox (Table A). If we take the limits specified by 
M + 1.96 бу, we define the 95% confidence-interval for the TM. 
Basing our judgment on these limits, in a long series of experiments 
we stand to be right 95% of the time and wrong 5%. For still greater 
assurance, we may take the limits M + 2.58 би, Which define the 
99% confidence-interval for the ТМ. 

Let us apply the concept of confidence-intervals to the problem of 
heights on page 184. Taking as our limits M 3- 1.96 би, we have 
67.46 + 1.96 X .028 or a confidence-interval limited by the points 
67.41 and 67.51. If we say that this interval contains the TM the 
probability of our being right is 95, of our being wrong .05. If we 
desire a higher degree of assurance, we can take the 99% confidence- 


THE RELIABILITY OF THE MEAN AND OF OTHER STATISTICS * 189 


interval. Here the limiting points are 67.39 and 67.53 (i.e., 67.46 = 
2.58 X .028). Our faith that these limits contain the Т.М is expressed 
by a probability of .99. ; 

) It may seem to many students that use of the confidence-interval 
is an exceedingly roundabout way of making an inference concerning 
the population mean; that it would be much more straightforward to 
say “the chances are 95 in 100 that the TM lies between 67.41 and 
67.51." Such probability statements concerning the value of the 
TM are often made and lead to what appears to be virtually the 
same result as that given above in terms of confidence-intervals. 
Theoretically, however, such inferences regarding the TM are defi- 
nitely incorrect, as the Т.М is not à variable which can take several 
values but is a fixed point. The TM has only one value and the 
probability that it equals some given figure is always either 10096 
ог zero—right or wrong. Our probability figures (e.g, 95 or .99) 
do not relate to our confidence that the ТМ itself could take one of 
several values within the given range. Rather, the probability used 
in specifying confidence-intervals is an expression of our confidence 
in the inference, namely, of our confidence that the given interval 
includes the TM. This is a subtle point, but a valid one. 

The limits of the confidence-interval of a parameter (TM) have 
been called by Fisher * fiduciary limits, and the confidence to be 
placed in the fiduciary limits as containing the given parameter is 
called fiduciary probability. In terms of fiduciary probability, the 
reliability of an obtained mean could be stated as follows: “The 
fiduciary probability is 95 that the true mean lies in the interval 
M + 1.96 oy, .05 that it lies outside these limits." 


(5) THE SE оғ THE MEAN IN SMALL SAMPLES 


It can be shown mathematically that the SD of a sample sys- 


tematically underestimates (is smaller than) the population о, 


although this underestimation is not severe unless the samples are 
quite small. To correct this tendency toward negative bias, we 
11 sample by the 


must compute the standard deviation of a sma the 
Dy? 


formula o' = Г Ул? _ yather than by the usual formula, с = А x 
(N — 1) 2 


(р.51). 

am Fisher, В. A., The Design of Experiments (London: Oliver and Boyd, 
5), рр. 200f. ` | ' 
ae ed Estimate of the Population Variance and 


ї Holtzman, W. H., “Th Unbias 
Standard таш, МН; Annet. Jour. Psychol., 1950, 63, 615-617. 


190 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


When Ñ is less than 50 or so (some statisticians say 30) the = 
formula for the SE of the mean should read 
ou = -© (40) * 
— 
VN 


(standard error of the mean in small samples) . 


where с’ = Vea and N = number of cases in the sample. 


Formula (40) always provides the best estimate of the SE of the 
mean, i.e., of the SD of the sampling distributions of means (Fig. 42, 
р. 184), no matter what the size of N. In very large samples, how- 
ever, the correction effected by using (40) is so slight as to be 
negligible and formula (39) may be safely used. When N is less than 
50 it is advisable to use the more exact formula, and it is imperative 
that we do so when N is quite small—less than 10, say. 

When we are dealing with small samples, the normal distribution 


-3 =o zl 1 2 3 4 
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FIG. 44 Distribution of f for degrees of freedom fro 
is very large, the distribution of t is virtual 
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imp tells us accurately the amount by which a statistic prob- 
uel E from its parameter. The sampling distribution to be 
те js en N is small is not as tall as the normal curve and the 

ails or ends are somewhat higher. Figure 44 shows graphically 
and ы distribution—called the £-distribution or "Student's" * que 
е ution—compares with the normal. The student should note that 
the t-distribution does not differ greatly from the normal unless № 
is quite small; and that as N increases in size the £-distribution ap- 
proaches more and more closely to the normal form. In the case of 


the sampling distribution of the mean, £= (Ме М) ож 2; 
SEx Ox 


that is, t is essentially a o-score (p. 305). 

д Selected points in the ¢-distribution are given in Table D. For N's 

ме їп іле, this table gives + t distances beyond which (i.e. 

n the left and right) certain percentages of the sampling distribu- 
ion lie. These percents аге .10, .05, .02, and .01. An illustration will 

make clear the use of Table D in small samples and will introduce 

the new concept “degrees of freedom" (see p. 193). 


Example (2) Теп measures of reaction time to a light stimulus 
are taken from one practiced Observer. The mean is 175.50 ms 
and the o^ is 5.82 ms. How reliable is this mean? 
From formula (40) we find that ox — E. or 1.84 ms. By definition, 
t (М- TM) 
SEx 
know, of course, the va 
know the proper number of 
value of ё at selected points i 
іт eedom) available for evaluating 
Entering Table D with 9 df we read t 


8.25 at the .01. From the first ё w 
means like 175.50 ms, the mean we have, lie between the TM and 


2.266, and that 5% fall outside these limits. From the second f 
We know that 99% of our sample means lie between the population 
Mean and +3.250x and that 1% falls outside these limits. We may 
be confident at the .05 level, therefore, that our mean of 175.50 ms 
does not differ from its parameter (TM) by more than +4.16 ms 


or in the present case, t = 175.50 — ТМ. We do not 
lue of the ТМ in the t-equation; but if we 
degrees of freedom we can determine the 
n the t-distribution. The df (degrees of 
the given Ё are (N —1) or 9. 
hat # = 2.26 at the .05 point and 
e know that 95% of sample 


S. Gosset who developed the t-distribu- 


. * "Student" was the pseudonym of W. 2556 
‘Statistical Method (New York: Henry 


tion: See Walk 
ker, Helen M., Elementary 
Holt and Co., 1943), p. 159. 


192 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


(2.26 X 1.84); and we may be confident at the .01 level that our - 


mean does not miss the TM by more than +5.98 ms (23.25 X 1.84). 

Confidence-intervals may also be established for the TM in 
this problem by the methods of page 187. Taking as our limits 
М = 2.26cy, we һауе 175.50 = 4.16 or 171.34-179.66 as indicating 
the limits of our .95 confidence-interval. Or taking M --3.25су as 
broader limits, we have 175.50 = 5.98 or 169.52-181.48 as marking 
off our .99 confidence-interval. If we infer that the population mean 
lies within the latter interval, in a long series of experiments we 
stand to be right 99% and wrong 1% of the time. The width of the 
-99 confidence-interval (11.96) shows clearly the high unreliability 
likely to exist in a mean when our estimate is based upon a very 
small sample. 

Several points in the solution of this problem deserve further com- 
ment as they illustrate clearly the difference between confidence 
levels in large and small samples. Had we used formula (39) in Ex- 
ample (2) instead of the correct formula (40), the SE of our mean 
would have been 1.75 ms instead of 1.84 ms, 5% too small. Again, 
the .05 and .01 confidence levels in the normal curve are +1.96 and 
+£2.58 (p. 187). These limits are 15% and 20% smaller than the 
correct t-limits of +2.26 and -є3.25 read from Table D for 9 df. It 
is clear, therefore, that when N is quite small, use of formula (39) 
will cause a calculated mean to appear more accurate than it actu- 
ally is. 

2.57 
\/8585 
or .028 inch. The student should note that had formula (40) and 
Table D been used in determining the reliability of the obtained 
mean of 67.46 inches, results would not have differed to the third 
decimal from those got with formula (39) and Table A. This is true, 
of course, because the N of 8585 is very large. As N increases (see Fig. 
44), t-entries in Table D approach more and more closely the corre- 
sponding normal curve deviates in Table A. In the normal curve, for 
instance (see Table А), 10% of the distribution lie beyond the limits 
+£1.65, 5% beyond the limits 21.96, and 1% beyond the limits 
£2.58. In Table D the corresponding t-limits 
£1.68, +2.01, +2.68. For (№ 


The SE of the mean in the height problem on page 184 was 
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groups smaller than 50, small-sample statisties are not as generally 
useful in psychology and education as they are in biology and 
agriculture.* 


(6) DEGREES OF FREEDOM 

The concept of “degrees of freedom” which we have encountered 
on pages 191-192 is highly important in small-sample statistics. It 
is crucial, too, in analysis of variance and will appear increasingly 
often in later chapters. The degrees of freedom (df) available for 
evaluating a statistic depend upon the number of restrictions placed 
upon the observations—one df being lost for each restriction imposed. 
Where one restriction comes from can best be shown by a simple 
example. If we have five scores, 5, 6, 7, 8, and 9, the M is 7; and the 
deviations of these scores around 7 are —2, —1, 0, 1, and 2. The sum 
of these deviations is zero.t While there are 5 deviations, only 4 of 
these [(V — 1) ] ean be freely selected as the condition that the sum 
of the deviations equals zero immediately fixes the fifth deviation. 
When there are N independent scores, there are N degress of free- 
dom for computing the M, but only (N — 1) df available for the SD 
since this statistic is computed from deviations taken around the M à 
In Example (2), page 191, the df available for determining the relia- 
bility of the M were given as (N — 1) or 9: one less than the num- 
ber of observations (i.e., 10). Since one df was lost in computing the 
M only (N — 1) are left for estimating the reliability of the M by 


Our best estimate of the true (or population) о (see p. 190) is 
obtained by using (N — 1) instead of N in the formula for the б; that 
is, by taking due account of the restriction imposed through calcula- 
tion of the M. It is quite important that we take df into кой 
when Ж is small; unimportant practically that we do so when Л is 
large (р. 192). In general the number of degrees of freedom available 
at any given time equals N minus the number of parameters already 
estimated from the N observations (each parameter adds a restric- 
tion). M is the only parameter estimated before computing the SD, 
and accordingly the df available in Example (2) were (N= 1). The 
number of df is not always (N — 1), however, but will vary with the 
statistic. In determining confidence levels for 7, for example, the 
available df are (V — 2). Two df are lost, one restriction being im- 


* Snedecor, George W., Statistical Methods (4th ed.; Ames, Iowa: Towa State 


Ollege Press, 1946), Chaps. 3 and 8. : 
ix- M) or ДА (calculntion algebraic) is always zero. 
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posed for the M of Y (the dependent variable) and another for the 
regression coefficient b, which describes the relation between Y and 
X.* Rules for determining the df available in the Chi-square test 
and in analysis of variance tables are given in appropriate places in 
later chapters. 


2. The reliability of the median 


The standard error of the median is roughly 5/4 times oy. In terms 
of c and О, the SE's of the median are 


1.2536 
gp 41) 
Сма VN" ( 
бмап = E (42) 


(standard error of the median in terms of c and of Q) 
Ап example will illustrate the use of formula (42). 


Example (3) On the Trabue Language Scale А, 801 twelve-year- 
old boys made the following record: Median — 21.40; Q — 49. 
How reliable is this median? How well does it represent the 
median of twelve-year-old boys in general on the given scale? 


. 1.858 X 4.9 

У 801 — 
large, accuracy limits may be taken at +1.96 and +2.58 (last line 
of Table D). We may be confident at the .05 level that the 
median of 21.40 does not miss the population median by more than 
£1.96 X .32 or +.63; and confident at the .01 level that 21.40 does 
not differ from the true median by more than +2.58 X .32 or by 
2:.83. The .99 confidence-interval for the true median is 21.40 + 82 
or from 20.57 to 22.23. This very narrow range (for which the 
P = .99) indicates high stability in the computed median. 


By formula (42) the cya, ог .32. Since М is quite 


III. The Reliability of Measures of Variability 


1. The reliability of the standard deviation 


The reliability of the SD 
median, is determined by са 
* Bee page 154. 


, like the reliability of the mean and 
leulating the probable discrepancy be- 


+ 
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tween the obtained SD and its parameter (true or population SD). 
The formula for the SE of the c is 
.716 
SE, or 6, => 43 
VN oe 


(standard error of a standard deviation) 


The sampling distribution of б is skewed for small samples (NV less 
than 25, say). When samples are large, however (greater than 100), 
and have been drawn at random from a normal population, formula 
(43) may be applied and interpreted in the same way as SHy. Lo 
illustrate, we found on page 184 that for 8585 British males, the SD 
around the M of 67.46 inches was 2.57 inches. By formula (43) 


6,- гот Ж, 02 inch. Since № is large, the .99 confidence-inter- 


4. w/8585 
val for the true or population SD can be taken as 6 + 2.580,. Sub- 


Stituting for c and с, we have 2.57 = 2.58 X .02 or 2.52—2.62 as our 
99 confidence-interval. If we proceed on the assumption that the 
true SD lies within this range we will—in a long series of experi- 
ments—be right 99% and wrong 1% of the time. 
It is not often that we are called upon to compu 
small samples. This is fortunate, as there is no very effici 
estimating the reliability of с when the sample is small. 


te the SE of o in 
ent way of 


2. The reliability of the quartile deviation or Q 


The reliability of 0 may be found from the formulas 


.786 
og = UN (44) 
and 
= 1.17Q (45) 


(standard error of Q in terms of в and of Q) 
terpreted as are the other SH 


These for е applied and in 
Per et the median of the 801 twelve- 


formulas. On 194, for example 

: page 194, ; с Y 

Year-old boys who took the Trabue Completion Test was 21.40 with 
‚ 117 X49 

a © of 49. The SE of this Q by (45) is aor or .20. The .95 con- 

pulation 0 may be taken as 4.5 to 

he .95 confidence- 


fidence-interval for the true or po 
5.3, ie, 4.9 + 1.96 X .20. The narrow range of t 


interval indicates high stability. 
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IV. The Reliability of Percentages 
and Correlation Coefficients 


This section will consider the computation and use of the SE's of 
& percentage and a correlation coefficient. For the SE's of other sta- 
tistics the student should go to the more advanced references. The 
Handbook of Statistical Nomographs, Tables and Formulas, by 
Dunlap and Kurtz (World Book Co., 1932), contains many formulas 
helpful in research. 


1. The reliability of a percentage 


It is often feasible to find the percentage of a given group which 
exhibits certain behaviors or possesses certain definite attitudes or 
other characteristies when it is difficult or impossible to measure 
these attributes directly. Given the percentage occurrence of a 
behavior the question often arises of how much confidence we can 
place in our figure. How reliable an index is it of the incidence of the 
behavior in which we are interested? То answer this question, we 
must go to the SE of the percentage given by the formula 


= |Q 46) 
„= ( 


(SE of a percentage) 


in which P = the percentage occurrence of ‘the behavior, Q — 1 — P, 
and N is the number of cases. 
We may illustrate formula (46) with the following problem. 


Ezample (1) In a study of cheating among elementary school 
children, 144 or 41.4% of 348 children from homes of good socio- 
economie status were found to have cheated on the various tests. 
Assuming our sample to be representative of children from “good” 
social levels, how much confidence can be put in this percentage? 
How well does it represent the population or true percentage? 


Applying formula (46) we get that был |4.4% X 58.6% 
а 348 


= 2.7%. The sampling distribution of 
mal when N is large ( larger than 50, say) and when P is not too close 
to 0% or 100%. Тһе SE is interpreted like оу. Thus in the present 
problem the .99 confidence-interval for the population percentage 18 


percents can be taken as nor- · 


ды» Жж 
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41.4% + 2.58 X 2.7 or from 34.4% to 48.4%. We may feel sure, 
therefore, that the percentage of children who will cheat in samples 
of this sort is at least as large as 34.4 and not larger than 48.4. The 
SE of a percentage finds its chief use in problems in which the signifi- 
cance of the difference between two percentages is to be determined 
(p. 236). 


^ 


2. The reliability of the coefficient of correlation (r) 


(1) THE SE or r 
The classical * formula for the SE of r is 


па 0 (47) 
VN 
(SE of a coefficient of correlation r when N is large) 
= 120. The 


In the height-weight problem on page 129, r = .60 and N 


А — .60? ё | 
SE of т by formula (47), therefore, is vA or .06 (to two deci- 


mals). To test the reliability of т in terms of its SE, we assume the 
sampling distribution of r to be normal, place the “true r” at the 
center (Fig. 45) of the distribution, and take .06 (i.e., SE,) to be the 


2.58 07 


че 1.960 
о„=0.06 
‚ МО. 45 There аге 95 chances іп 100 that the obtained r does not miss 
the true r by more than 6.12025 1.966;). The .99 confidence-interval 
for the true r is r £ 2.589, or .60 2 15, іе «45 to 75 
don we, G. Udny, An Introduction to 186 Theory о) Statistics (10th ed.; Lon- 
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SD of this sampling distribution of 775. Since the probability is .05 
of an error exceeding +1.96c,, there is only one chance in 20 that an 
error of +.12 or more exists in our r. Again, the .99 confidence inter- 
val for the true т can be taken as r + 2.586,. Substituting for r and 
SE, we get .45 and .75 as the limits of our .99 confidence-interval. It 
would seem reasonably certain, therefore, that r is at least as large 
as .45. 

There are two serious objections to the use of formula (47). In 
the first place, the r in the formula is really the true or population r. 
Since we do not have the true т, we must substitute the calculated or 
sample r in the formula in order to get an estimate of the standard 
error. If the obtained r is in error, our estimate will also be in error; 
апа at best it is approximate. 

In the second place, the sampling distribution of r is not normal 
except when the population r is .00 and N is large. When r is high 
(.80 or more, say) and N is small, the sampling distribution of r is 
skewed and the SE from (47) is decidedly misleading. Skewness in 
the sampling distribution of high r's grows out of the fact that the 
range of r is from +1.00 to —1.00. If т = .80 and N = 20, the prob» 
ability of an r less than .80 in a new sample of 20 cases is much 
greater than the probability of an т greater than .80 because of the 
obtained r's nearness to unity. The distribution of r's obtained from 
successive samples of 20 cases will be skewed negatively (р. 98) and 
the skewness will increase as r increases. For values of r, between 
5.50, and for N’s of 100 or more, the distribution of т in successive 
samples will conform fairly closely to the normal curve and formula 
(47) will yield a useful estimate of reliability. But unless used with 
caution, SH, is likely to be misleading. 

A mathematically more defensible method of testing the signifi- 
cance of an т, especially when the coefficient is high, is to convert r 
into В. A. Fisher’s z-function * and find the SE of 2. Тһе function 2 
has two advantages over r: (1) its sampling distribution is approxi- 
mately normal and (2) its SE depends only upon the size of the 

' sample М, and is independent of the size of т. The formula for о; i 
. 1 
6; — з =з ^ (48) 
(SE of Fisher's function z) 
Suppose that r = .85, and N = 52. Then from Table C we read 


* Fisher, В. Å., Statistical Meth d: ; 3 : 
den Boye qo) d ne for Research Workers (8th ed.; London: 
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that an т of .85 corresponds to a 2 of 1.26. 5Е, from (48) is ——— : 1 
@ 14. The .95 confidence-interval for the true 2 is now .99 | Е. 
B 2 = 1.96 X .14 or 1.26 = 27). Converting these 2’s back 
we get a confidence-interval of from .76 to .91. The fiduciary 
probability is .95 that this interval contains the true т (р. 189). 
E Coefficient of correlation .60 in the height-weight problem 
i e is not large enough for the conversion into z to make much 
rence in our reliability estimates. An т of .60 is equivalent to a z 


of .69 (Table C), and the SZ, is ог .09 (to two decimals). 
The .99 confidence-interval for the true z, therefore, is .46 to 92 (i.e. 


-69 Æ 2.58 X .09 ог .69 + .23). When we convert these z's back into 


T S the 99 confidence-interval for the true 7 
ange is almost identical with that on page 198 obtained when we 


used r and SE,. 


(2) TESTING 7 AGAINST THE NULL HYPOTHESIS 


The reliability of an obtained r may be tested also against the 


Е ч г=-018 т-0.00 т-018 
IG. 46 When the population ris zero, and df = | 18, 5% of the sample 
r's exceed +.18, and 1% exceeds +.24 


fact zero.* If the computed 7 
serious doubt upon this null 
ence of at least some 


hypothesis that the population 7 is in 

Т large enough to invalidate or cast 
Ypothesis we accept r as indicating the pres 
*See page 213 for definition of null hypothesis. 


becomes .43 to .73. This ` 
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degree of correlation. To make the test, enter Table 25 with 
(N — 2) degrees of freedom * and compare the obtained r with the 
tabulated entries. Two significance levels, .05 and .01, are given in 
Table 25, which is read as follows when, for example, т = .60 and 
М = 120. For 118 df the entries at .05 and .01 are by linear interpola- 
tion .18 and .24, respectively (to two decimals). This means that 
only 5 times in 100 trials would an r as large as +.18 arise from 
fluctuations of sampling, if the population r were actually .00; and 
only once іп 100 trials would an т of +.24 appear if the population т 
were in fact .00 (Fig. 46). It is clear that the obtained т of .60, since 
it is much larger than .24, is significant at the .01 level. 


TABLE 25 } Correlation coefficients at the 5% and 1% levels of signifi- 


cance 


Example: When N is 52 and df is 50, an r must be .273 to be significant at 
.05 level, and .354 to be significant at .01 level. 


Degrees of Degrees of 
freedom 05 91 freedom 05 101 
(N — 2) (N — 2) 

1 997 1.000 24 388 496 

2 950 990 25 381 487 

3 878 959 26 374 478 

4 81 917 27 367 410 

5 1754 874 28 361 468 

6 707 834 29 355 456 

7 666 798 30 1349 449 

8 632 1765 85 325 418 

9 602 785 40 304 393 

10 576 708 45 288 372 

11 553 684 50 278 354 

12 532 661 60 250 325 

18 514 641 70 282 3802 

14 497 623 80 217 283 

15 482 606 90 205 267 

16 468 590 100 1195 254 

17 456 575 125 174 398 

18 444 561 150 159 208 

19 483 549 200 1138 181 

20 493 537 300 113 148 

21 413 526 400 1098 198 

22 404 515 500 (088 115 

23 396 505 1000 1062 ‘081 


Table 25 takes account of both ends of the sampling distribution— 
does not consider the sign of т. When N = 120, the probability (P/2) 
of an r of .18 or more arising on the null hypothesis is .025; and the 

* Page 193. 


go table is abstracted from the column for 2 variables in Table J, page 
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ү ашу of an r of —.18 or less is, of course, .025 also. For a P/2 
EV is of 02) the т by linear interpolation between .05 (.18) 
5 5 (24) is 21. On the hypothesis of a population 7 of zero, 
erefore, only once in 100 trials would a positive т of .21 or larger 
arise through accidents of sampling. 
к. 05 and .01 levels in Table 25 are the only ones needed ordi- 
hace in evaluating the significance of an obtained т. Several illus- 
mg of the use of Table 25 in determining significance are given 


Size of S; Degrees of J 
(N ample Freedom Calculated Interpretation 
(N —2) 7 
10 8 70 significant at .05, 
not at .01 level 
152 150 —12 not significant 
27 25 50 significant at 05, 
barely at .01 level 
500 498 20 very significant 
100 98 - 80 very significant 


It is clear from these examples that even a small 7 may be signifi- 


cant if computed from a very large sample, and that an r as high as 
70 may not be significant if N is quite small. Table 25 is especially 
Useful when N is small. Suppose that we have found an т of .55 from 
a Sample of 12 cases Entering Table 25 with (М — 2) or 10 df we 
aco that r must be .71 to be significant at the 01 level and .58 to be 
ignificant at the .05 level. In this small sample, therefore, even an т 
аз high as .55 cannot be taken as indicative of any real correlation. 


V. Sampling and the Use of Reliability Formulas 
All-of the reliability formulas given in this chapter depend upon 
э the size of the sample, and most of them require some measure of 
Variability (usually 6). І is unfortunate, perhaps, that there is 


Nothing in the statement of an SE formula which might deter the 
the statistics caleulated from 


uncritical worker from applying it to 

апу set of test scores. But the general and indiseriminate computa- 
Hon of SE's will inevitably lead to erroneous conclusions and false 
interpretations, Hence, it is highly important that the research 
Worker in experimental psychology and in educational research have 
Clearly in mind (1) the conditions under which reliability formulas 
are (and are not) applicable; and that he know (2) what his relia- 
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bility formulas may be reasonably expected to do. Some of the limi- 
tations to reliability formulas have been given in this chapter. These 
statements will now be amplified and further cautions to be observed 
in the use of SE's will be indicated. 


1. Methods of sampling 


Various techniques have been devised for obtaining 8 sample 
which will be representative of its population. The adequacy of а 
sample (i.e., its lack of bias) will depend upon our knowledge of the 
population or supply * as well as upon the method used in drawing 
the sample. Commonly used sampling methods will be described in 
this section under four headings: random, stratified or quota, іпсі- 
dental, and purposive. 


(1) RANDOM SAMPLING 


The descriptive term “random” is often misunderstood. It does not 
imply that the sample has been chosen in an offhand, careless or hap- 
hazard fashion. Instead it means that we rely upon a certain method 
of selection (called “random”) to provide an unbiased cross section 
of the larger group or population. The criteria for randomness in а 
sample are met when (1) every individual (or animal or thing) іп 
the population or supply has the same chance of being chosen for 
the sample; and (2) when the selection of one individual or thing 
in no way influences the choice of another. Randomness іп а sam- 
ple is assured when we draw similar and well shaken-up slips out of 
a hat; or numbers in a lottery (provided it is honest); or à hand 
from a carefully shuffled deck of cards. In each of these cases selec- 
tion is made in terms of some mechanical process and is not subject 
to the whims or biases (if any) of the experimenter. 

A @еаг distinction should be made between representative and 
random samples. A representative sample is one in which the dis- 
tribution of scores in the sample closely parallels that of the popula- 
tion. Experience has shown that if one is asked to get representative 
samples from a population he will for various reasons (some not 
recognized) often draw samples which exhibit consistent biases of 
one sort or another. The most trustworthy way of securing represen- 
tativeness, therefore, is to make sure that the sampling is random. if 
we draw samples at random from the population we know at least 

* A supply usually means a population of objects or things. 


” 
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that (a) there will be no consistent biases; (b) on the average these 
samples will be representative; (c) the degree of discrepancy likely 
to occur in any given sample can be determined by probability 
methods. The SE formulas given in this chapter apply only to ran- 
dom samples, 

In research problems in psychology and in education three situa- 
tions arise in connection with the drawing of a random sample: (a) 
the members of the population or supply are on file or have been 
catalogued in some way; (b) the form of the distribution of the 
trait in the population is known to be (or can reasonably be assumed 
* to be) normal; (c) the population is known only in general terms., 
These situations will be discussed in order. 

(a) Members of population are on file or are catalogued. If the 
Population has been accurately listed, a type of systematic selection 
will provide what is approximately a random sample. Thus we may 
take every fifth or tenth name (depending upon the size of the sam- 
ple wanted) in a long list, provided names have been put in alpha- 
betical order and are not arranged with respect to some differential 
factor, such as age, income or education. (A better plan in such cases 
is to assign numbers to the members of the population and draw a 
Sample as described below.) By this method an approximately 
random sample of telephone users may be obtained by reference to 
the telephone directory; of sixth grade children from attendance 
rolls; of automobile owners from the licensing bureau; of workers in 
а factory from payroll lists. Random samples of the population with 
Tespect to a variety of characteristics may be drawn in the same way 
from census data. (07 | 

Systematic selection from а catalogued population is often used in 

etermining the acceptance rate of industrial products. Thus in 
Sampling machine-produced articles for defectives, a random sample 
Шау be obtained by taking every tenth article, say, as it comes from 
the machine, Sampling of this sort is justified if the manufactured 
articles are taken just as they come from the machine, so that sys- 
tematic selection provides an approximately random sample from the 
Supply. | 

When the subjects in a group are to be assigne 
Or more experimental and control sub-groups, tabl 

ers may be used to good purpose." In such tables, 
F., Statistical Tables (New York: Hafner Publish- 


x 


d at random to one 
es of random num- 
numbers arranged 


;,* Fisher, В. A., and Yates, 
ing Co., 1948), Table 33. 
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by a chance procedure are printed in sequence. The tenth block of 
25 numbers taken from Fisher’s and Yates’ Table and reproduced 
below will serve as an example. 


34 50 57 74 37 


85 22 04 39 43 
09 79 13 77 48 
88 75 80 18 14 


90 96 23 70 00 


The Fisher-Yates table is made up of 300 similar blocks of 25 num- 
bers, printed on 6 pages of 10 rows and 5 columns each. To read 
from the table one may begin at any point on any page and read in 
any direction, up or down, right or left. When all of the individuals 
in the entire group or population have been numbered in 1, 2, 3 order, 
a random sample of any size can be drawn by following in order the 
numbers read from the table. Suppose, for example, that a random 
sample of 25 is to be drawn from a larger "population" of 100. Then 
if we have decided beforehand to start with the second column in the 
block above and read down, individuals numbered 50, 22, 79, 15, 
and 96 will be included. Other blocks chosen in advance may be used 


to provide the additional 20 subjects. If the same number occurs | 


twice, the second draw is disregarded. 

(b) Distribution of trait in population known. As result of much 
research in individual differences many physical and mental traits 
are believed to be normally distributed (at least approximately) in 
the population. If we are justified in assuming that the trait or 
ability in which we are interested is normally distributed in the 
general population, a sample drawn at random from this population 
will itself tend toward normality, so that symmetry of distribution 
becomes an excellent criterion of sample adequacy. 

(c) Population known only in general terms. In many problems 
in psychology and in education the population is (1) not clearly de- 
fined, (2) not readily accessible for sampling (for example, the 
population of a state), and (3) very expensive to sample extensively. 
"Under conditions such as these a useful test of the adequacy of à 
sample consists in drawing several samples at random and in succes- 
sion from the population, such samples to be of approximately the 
same size as the sample with which we are working. Random sam- 
ples of ten-year-old school boys in a large school system, for instance, 
must be drawn without bias as to able, mediocre, or poor individuals; 
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they cannot be drawn exclusively from poor neighborhoods, from 
expensive private schools, or from any larger group in which special 
factors are likely to make for systematic differences. 

When the means and o’s of these presumably random samples are 
closely alike we may feel reasonably sure that our samples are repre- 
sentative of their population. If the correspondence among samples 
is not close we must re-examine each sample for bias. This test 
has been criticized on the grounds that (1) the correspondence of two 
or more samples may reflect nothing more than a common bias, and 
(2) consistency is not a sufficient criterion of randomness. While 
this is true and the test is admittedly rough, it may be argued that 
a reasonable consistency among samples is a necessary first condi- 
tion of randomness. If samples are fairly consistent, therefore, 
they are presumably random unless subsequent examination reveals 
а common bias. If samples differ widely, we cannot be sure that any 
18 random. 


(2) STRATIFIED OR QUOTA SAMPLING 

Stratified or quota sampling (also called “controlled” sampling) 
is а technique designed to insure representativeness and avoid bias 
by use of a modified random sampling method. This scheme is 
applicable when the population is composed of sub-groups or strata 
of different sizes, so that a representative sample must contain 
individuals drawn from each category ог stratum in accordance 
With the sizes of the sub-groups. Within each stratum or sub-group 
the sampling is random—or as nearly so as possible. Stratified 
Sampling is illustrated in the standardization of the 1937 Stan- 
ford-Binet Scale in the course of which approximately 3000 chil- 
dren were tested. To insure an adequate selection of American 
Youth, the occupational levels of the parents of the children in 
the standard group were checked against the six occupational levels 
of employed males in the general population as shown by the US. 
Census of 1930, Differing proportions of men were found in the 
Broups classified as professionals, semi-professionals, businessmen, 
farmers, skilled laborer, slightly skilled and unskilled laborers. Only 
4% of employed males were found in the professional group, while 

1% were in the skilled labor group. Accordingly, only 4% of the 
children in the Stanford-Binet standardization group could have 
fathers in the professional category, while 31% could have fathers in 
the skilled labor group. In public opinion polling, the investigator 


шай 


% 
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must see that his sample takes account of various strata or criteria” 


such as age, sex, political affiliation, urban and rural residence, 
etc. 

When sampling is stratified, the SE formula for the mean differs 
slightly from the SE, formula when sampling is strictly random. 


'The new formula is 
: E= 2 
су = ae (49) 


(SE of M when sampling has been stratified) 


in which o =SD of the entire sample 
с, = SD of the means of the various strata around the mean 
of the entire sample. 


А convenient formula for c, is 


T CER = M)? + Ns (M5 М)? + ...- NOM — M)*](50) 


N 


(standard deviation of the means of strata around the mean 
of the entire group) 


in which Ni, М»... №, = number of cases in strata 1 to k; and 
N and M are the size and mean of the whole sample. 

To illustrate formula (49), suppose that in a sample of 400 case, 
there are 8 sub-groups or strata which vary in size from 70 to 22: 
Тһе М of the whole sample is 80 and o is 15. The SD of the means 
of the 8 strata [by (50) | around the general mean of 80 is know? to 
be 5. Substituting in (49) we have 


|225 — 25 _ |200 _ 
би = ү 399 = 4200 = 71 


Had no account been taken of the variation in the sub-groups, бж 
would have been 225 ог .75. Unless the various strata introduce 
considerable variation, it is obvious that the correction got by using 
(49) instead of (39) is fairly small. 

(3) INCIDENTAL SAMPLING 


The term incidental sampling (also called “accidental” sam- 
pling) should be applied to those groups which are used chiefly 
because they are easily or readily obtainable. School children, col- 


іш believing such a small group of pers 


“elusive (and presumably represen’ 
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lege sophomores enrolled in psychology classes, and laboratory 
animals are available at times, in, numbers, and under conditions 
none of which may be of the experimenter’s choosing. Such casual 
groups rarely constitute random samples of any definable popula- 
tion, Reliability formulas apply with a high degree of approximation 
—if at all—to incidental samples. And generalizations based upon 
such data are often misleading. 


(4) PURPOSIVE SAMPLING 


A sample may be expressly chosen because, in the light of avail- 
able evidence, it mirrors some larger group with reference to a given 
characteristic, Newspaper editors are believed to reflect accurately 
public opinion upon various social and economic questions in their 
Sections of the country. A sample of housewives may represent 
accurately the buyers of canned goods; a sample of brokers, the 
opinion of financiers on a new stock issue. If the saying “As Maine 
Boes, So goes the Nation” is accepted as correct, then Maine becomes 
an important barometer (a purposive sample) of political thinking. 
Random sampling formulas apply more or less accurately to pur- 
Posive samples. 


2. Size of sample 


The reliability of M or 6 depends (p. 182) upon the size of the 


Sample upon which the SE is based. SE's vary inversely as the 
Square root of sample size so that the larger the N in general the 
Smaller the SE, A small sample is often satisfactory in an inten- 
Sive laboratory study in which many measurements are taken upon 


each subject, But if is less than 25, say, there is often little reason 
ons to be adequately descrip- 


Ive of any population. 


. The larger the N the larger the SD of the sample and the more 


tative) our sample becomes of 
ed by samples of different 


Зе general population. The range cover 
rmal population—will be 


ae When all are drawn from a no 
Pproximately as follows: 


N=10 Range + 2.06 
N=50 Range + 2.50 
N = 200 Range + 3.06 


N = 1000 Range = 3.56 


TT Fat 
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А range of + 3.56 from the mean includes 9995 cases in 10,000 in a~ 


normally distributed population. In a sample of 10,000 only 5 cases 
lie outside of this range; in a sample of 100 cases none lies outside 
of this range. The more extreme the score, large or small, the less 
the probability of its occurrence in a small sample. In fact, in very 
small samples widely deviant scores will rarely appear in a random 
sample drawn from a normal group. 

A fairly simple and practical method of deciding when a sample 
is “sufficiently large” is to increase N until the addition of extra 
cases, drawn at random, fails to produce any appreciable change 
(more than +1858, say) in the M and о. When this point is 
reached, the sample is probably large enough to be taken as ade- 
quately descriptive of its population. But the corollary must be 
recognized that mere numbers in and of themselves do not guarantee 
arandom sample. (See also p. 114.) 


3. Sampling fluctuations and errors of measurement 


SE's measure (1) errors of sampling and (2) errors of measure- 
ment. We have already considered the question of sampling errors 
on page 185. The investigator in establishing generalizations from 
his data regarding individual differences, say, must perforce make 
his observations upon limited groups or samples drawn at random 
from the population. Owing to differences among individuals and 
groups, plus chance factors (errors of measurement), neither the 
sample in hand nor another similarly drawn and approximately of 
the same size will describe the population exactly. Hence it is un- 
likely that M’s and o’s from successive samples will equal each other. 
Variations from sample to sample—the so-called “errors” of sam- 
pling—are not to be thought of as mistakes, failures and the like, 
but as fluctuations arising from the fact that no two samples are 
ever quite alike. Means and o’s from random samples are, then, 
estimates of their parameters, and the SE formulas measure the 
goodness of this estimate. 

'The term errors of measurement includes all of those variable 
factors which affect test scores, sometimes in the plus and sometimes 
in the minus direction. If the SEx is large, it does not follow neces- 
sarily that the mean is affected by a large sampling error, as much 0 
the variation may be due to errors of measurement. When errors of 
measurement are low, however (reliability of tests high, see p. 348), 
a large SEx indicates considerable sampling error. 


"Yd 
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4. Bias in sampling and constant errors 


Errors which arise from inadequate sampling or from bias of any 
sort are neither detected nor measured by reliability formulas. The 
Mean score on an aptitude test achieved by 200 male college fresh- 
men in a college of high admission standards will not be representa- 
tive of the aptitude of the general male population between the ages 
of 18 and 21, say, and for this reason the SEy for this group is not an 
adequate measure of sampling fluctuations. College freshmen usually 
constitute an incidental—and often a highly biased—sample. In 
Consequence, other samples of young men 18-25, drawn at random 
from the male population, will return very different means and o's 
from those in our group. Differences like these are not sampling 
fluctuations but are errors due to inadequate or biased selection. 
Reliability formulas do not apply. 

SE’s do not detect constant errors. Such errors work in only one 
direction and are always plus or minus. They arise from many 
Sources—familiarity with test materials prior to examination, cheat- 
Ing, fatigue, faulty techniques in administering and in scoring tests, 
in fact from a consistent bias of any sort. SE's are of doubtful value 
when computed from scores subject to large constant errors. The 
careful study of successive samples, rechecks when possible, care in 
Controlling conditions, and the use of objective tests will reduce 
many of these troublesome sources of error. The research worker 
Cannot learn too early that even the best statistical techniques are 


unable to make bad data yield valid results. 


PROBLEMS 


1. Given M = 2640; с = 520; N= 100 
(a) What is the probable divergence of this M from its parameter 


at the .01 level of confidence? 


(true mean) ) 
f с from its true (population) 


(b) What is the probable divergence o 
value at the .05 level of confidence? 
(с) Find the .99 confidence-interval for the true mean. 
2. The mean of 16 independent observations of a certain magnitude is 


100 and the SD is 24. 
(a) At the .05 confidence level what are the fiduciary limits of the true 


mean? (p. 189) 
(b) Taking the .99 confidence-interval as our standard, we may be 


assured that the true mean is at least as large as what value? 
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3. 


10. 


11. 


For a given group of 500 soldiers the mean AGCT score is 95.00 and * 


the SD is 25. 
(a) Determine the .99 confidence-interval for the true mean. 
(b) It is unlikely that the true mean is larger than what value? 


. The mean of a large sample is К and ок is 2.50. What are the chances 


that the sample mean misses the true mean by more than (a) +1.00; 
(b) -Е3.00; (e) 210.00? 


. Тһе following measures of perception span for unrelated words are 


obtained from 5 children: 5 6 4 7 5 

(a) Find the .99 confidence-interval for the true mean of these scores. 

(b) Compare the fiduciary limits (.99 confidence-interval) when cal- 
culated by large sample methods with the result in (a). 


. Suppose it is known that the SD of the scores in a certain population 


is 20. How many cases would we need in a sample in order that the SE 
(a) of the sample M be 2? 
(b) of the sample SD be 1? 


. In a sample of 400 voters, 5096 favor the Democratic candidate for 


president. How often can we expect polls based on random samples of 
400 to return percents of 55 or more in favor of the Democrats? 


. Opinion upon an issue seems about equally divided. How large а sam- 


ple (М) would you need to be sure (at .01 level) that а deviation of 
396 in a sample is not accidental (due to chance) ? 


. Given an r of .45 based upon 60 cases, 


(a) Using formula (47), p. 197, find the SZ,. Determine the limits of 
the .99 confidence-interval for the population 7. 

(b) Convert the given т into z, and find о, by formula (48). Check the 
limits of the 99 confidence-interval determined from о, against 
those found in (a) above. 

(с) Is the given r significant at the .01 level? (Use Table 25.) 


An r of .81 is obtained from a random sample of 37 cases. 

(a) Establish the fiduciary limits of the true r at the .01 level, using the 
z-conversion. 

(b) Check the significance of r from Table 25. 

Given a sample of 500 cases in which there are six sub-groups or strata- 

Тһе means of the six sub-groups are 50 (М = 100), 54 (М = 50), 46 

(N = 100), 50 (М = 190), 58 (N = 80), 42 (N = 50). The SD for 

the entire sample is 12. 

(a) Find the mean of the whole sample of 500 (p. 272). 

(b) Compute the сз by formula (49) (p. 206). 

(c) Compare oy by formula (39) with the result found in (b). 


äi 
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12. Fill in the following table: 


Size of Sample а) 


(N) (N — 2) r Significance 
(a) 15 13 —.68 
(b) 30 28 32 
(c) 82 80 —.40 
(d) 225 223 05 
ANSWERS 


- (a) We may be confident at the .01 level that the obtained M does not 


miss the TM by more than +1.34 (Т.М + 1.34). 


(b) --13 (To + 1.96 X 37). 
(c) 27.74 to 25.06. 


“ (а) 112.78 and 87.22 —. 


(5) 823 


- (a) 97.89 to 92.11 


(b) 97.89 


- 69 in 100; 23 in 100; less than 1 in 100 


- (a) 7.75 to 3.05 


(b) By large sample methods (22.580) fiduciary limits are 6.59 to 
4.21. 


- (a) 100 (5) 202 
- About once in 50 trials 

. 1850 

- (a) 72 to 18 (5) 67 to 15 (с) Yes 


- (a) 91 to .60 


(5) Significant at .01 level 


- (а) 50.08 (Б) 495 (с) 537 vs. 495 
“ (a) Significant at .01 level 


(6) Not significant » 
(с) Significant at .01 level 
(4) Not significant 
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1. The Significance of Differences between Means 


and Medians 


Suppose that we wish to discover whether ten-year-old boys and 
ten-year-old girls differ in mechanical aptitude. In attacking this 
problem, ordinarily we would first secure as large and as representa- 
tive a sample of ten-year-old boys and ten-year-old girls as possible, 
administer our mechanical aptitude tests, compute means and о%, 
and find the difference between the two means. А large mean differ- 
ence in favor of the boys would offer strong evidence that boys of ten 
are mechanically more’ apt than are girls of ten. Contrariwise, à 
small difference (not more than 2-3 points, for example) woul 
clearly be unimpressive, and would suggest that further compara- 
tive tests might well show no difference at all between the two 
groups. 

When can we feel reasonably sure that a difference is large enough 
to be taken as real and dependable? This question involves the 
reliability of the measures compared, and its answer can rarely be 
stated in unequivocal terms. Reliability, as we found in Chapter 8, 
is always relative and can be stated only in terms of probability. A 
given difference is called reliable or significant when the probability 
is high that it cannot be explained away as temporary or accidental. 
And a difference is called non-significant when it appears to be rea- 
sonably certain that it could easily have arisen from sampling fluc- 
tuations (or sampling accidents) and hence implies по “real” or true 
difference. 
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1. The null hypothesis 


1 Experimenters have found the null hypothesis a useful tool in test- 
ing the reliability of differences. Inits simplest form (see p. 247) , this 
hypothesis asserts that there is no true difference between two popu- 
lation means, and that the difference found between sample means is, 
therefore, accidental and unimportant. The null hypothesis is akin 
to the legal principle that a man is innocent until he is proved guilty. 
It constitutes a challenge; and the function of an experiment is to 
give the facts a chance to refute (or fail to refute) this challenge. 
To illustrate, suppose it is claimed that Eskimos have keener vision 
than Americans. This hypothesis is vaguely stated and cannot be 
tested precisely as we do not know how much better the Eskimo’s 
Vision must be before it can be adjudged “keener.” If, however, we 
assert that Eskimos do not possess keener vision than Americans, or 
that the differences are trifling and unimportant (the true difference 
being zero), this null hypothesis is exact and can be tested. If our 
null hypothesis is untenable it must be rejected. And in discarding 
Our null hypothesis, what we are saying is that differences in visual 
Acuity as between Eskimos and Americans cannot be fully explained 


as temporary and occasional. 
2. The reliability of the difference between two independent means 


In order to discover whether two groups differ sufficiently in mean 
Performance to enable us to say with confidence that a difference will 
Persist upon repetition of the experiment, we need a standard error 
01 the difference between the two means. Two situations with respect 

0 mean differences arise; those in which the means are uncorrelated 


and those in which the means are correlated. 
(1) тик SE or THE DIFFERENCE (бр) WHEN MEANS ARE UNCORRE- 


LATED 
Тһе formula for the SE of the difference between uncorrelated or 


Independent means is 
К: Жайт 
Op OT Oy, - м = VO wy 1 охь з 


gi? 02 
бр OT Ox, — Mo — EU 


(standard error of the difference between two uncorrelated means) 


or (51) 
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in which oy, is the SE of the mean of the first group; био is the SE 
of the mean of the second group; and op is the SE of the difference 
between the two means. Means are uncorrelated when calculated 
from different groups, or from uncorrelated tests administered to the 
same group. From formula (51) it is clear that one way to find the 
SE of the difference between two means is first to compute the SE's 
of the two means themselves. Another way is to calculate op 
directly if бу, and oy, are not wanted. 

Applieation of formula (51) is illustrated by the following ex- 
ample: 


Example (1) In a study of the intelligence of the foreign-born 
white draft during World War I, a sample of 611 native-born Nor- 
wegians and a sample of 129 native-born Belgians were found to 
test as follows on the *combined scale." * 


Country of Birth Number of Cases Mean Score с 
Norway 611 12.98 247 
Belgium 129 12.79 2.42 


Would further testing of similar samples of Norwegians and Bel- 
gians give virtually this same result; or in further testing would 
the mean difference perchance be reduced to zero, or even reversed 
in favor of the Belgians? 


To answer these questions we have first computed the SE's of the 
two means and from these the SE of the difference between the two 
means. By formula (39) the SE's of the means are 


Norwegians: ou, = кон = .0999 


м611 


Belgians: оц, = 242. 2130 


7 w129 


Substituting these SE's in formula (51) we have 
9p = \/(.0999)? -F (2130)? = .24 (to two decimals) 


The actual difference between the means of Norwegians and Bel- 
gians, then, is .19 (12.98 — 12.79) and the SE of this difference (ор) 
is .24. In inquiring whether the two groups actually differ in mean 
performance, we shall set up a null hypothesis, namely, that the 

*The “combined scale” included the 8 Alpha tests, the Stanford-Binet, and 
tests 4, 5, 6, and 7 from Beta. The maximum score was 25. For the data given 


in this problem, see Brigham, С. С., A Study of American Intelligence (Prince- 
ton: Princeton University Press, 1923), pp. 120-121. 


„Й 


51 
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difference between the population means of Norwegians and Belgians 
is zero, and that—except for accidental errors—mean differences 
from sample to sample would all be zero. Stated specifically, we ask 
whether—in view of its SE—the mean difference of .19 is really large 
enough to cast grave doubt upon our null hypothesis. 

As a first step in making our test we compute a critical ratio, or 


CR, by dividing the obtained difference by its SE (cr = 2) MTS 
с 


the present problem, the CR = 19/24 or .79. The distribution of 
C's is known to be normal around the population or true difference 
when N is large. Hence, in testing our null hypothesis, we may set 
up a normal distribution like that shown in Figure 47, in which the 
mean is set at zero (true difference) and the o of the distribution of 
differences is .24 (op). From the critical ratio our obtained differ- 
ence of .19 is seen to fall at a point 7906p from the hypothetical 
mean of zero; and the difference of — 19 falls at —.790р. 


ор=02А 
CR - 0рь= 02407900 
FIG. 47 


t 29% X 2 or 58% of the cases in 
the mean and +.790p; and 4296 
ts. This means that under the 
t differences as large or larger 
comparisons of Norwegians and 


Now from Table А we know tha 
а normal distribution fall between 
of the cases fall outside these limi 
Stipulated conditions we can expec 
than =.19 to occur 42 times in 100 

* CR really equals (М.- MJ = 9 or D =D. the difference (D) between the 
С; 


р Ор 7 
two means is measured from zero in terms of 0р (see Fig. 47). 
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Belgians. A difference as large as +.19, therefore, might readily 
arise as a sampling fluctuation from zero and is clearly not signifi- 
cant. Accordingly, we retain the null hypothesis and conclude with 
confidence that, on the evidence, there is no real difference between 
Norwegians and Belgians on the “eombined scale.” When the null 
hypothesis is retained (as here) the result may be stated also as 
follows: there is good reason to believe that these two groups were 
drawn from the same population with respect to tested intelligence 
and differ only by sampling errors. 


(2) LEVELS OF SIGNIFICANCE 


The answer to the question of when a difference is to be taken as 
statistically significant depends upon the probability of the given 
difference arising “by chance” (p. 87); and it depends also upon the 
purposes of the experiment (p. 186). Usually a difference will be 
marked “significant” when the gap between the two sample means 
points to or signifies a true difference between the parameters in the 
population from which the samples were drawn. Tt would seem to be 
fairly obvious, then, that before a judgment of “significant” or “non- 
significant" can be made, some point or points must be found along 
a probability scale which will serve to separate these two judgment 
categories. At the same time, it must be recognized that judgments 
of significance are never all-or-none but range over & wide scale of 
probabilities, our confidence increasing as the probability of error 
decreases. 

Experimenters have for convenience chosen several arbitrary 
standards—called levels of significance—of which the 105 and .01 


levels are the most often used. The .05 and the .01 significance levels > 


are analogous to the .05 and .01 levels of confidence used in estimat- 
ing the reliability of the mean and other statistics (Chapter 8). The 
confidence with which an'experimenter rejects—or retains—a null 
hypothesis will depend upon the level of significance reached. From 
Table D we know that 1.960 mark off points in the normal dis- 
tribution to the left and right of which lie 5% of the cases (21596 at 
each end). When a CR is 1.96 or more, therefore, we reject а null 
hypothesis at the .05 level of significance—on the grounds that not 
more than once in 20 trials would a difference occur as large or larger 
than that obtained, if the true difference were zero. The CR of .79 
in the problem of Norwegians and Belgians (p. 214) falls short of 
1.96 (does not reach the .05 level of significance) and accordingly 
the null hypothesis is retained. i 


" 


* in accordance with the null hypothesis, 
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Тһе .01 level of significance is more exacting than is the .05 level. 
From Table D we know that 22.580 mark off points to the left and 
right of which lies 1% of the cases in a normal distribution. If the 
CR is 2.58 or more, therefore, we reject the null hypothesis at the 
101 level of significance, on the grounds that not more than once in 
100 trials would a difference of this size occur if the true difference 
were zero. The significance of a difference may also be evaluated by 
establishing confidence-intervals for the true difference—as was done 
for the ТМ on page 187. Thus the limits specified by D + 1.9605 
define the .95 confidence-interval for the true D; and D + 2.5805 
define the .99 confidence-interval for the true D. By way of illustra- 
lion, we may again take the problem of comparing the intelligence 
of the Norwegians and Belgians on page 214 where the D — .19 
and the op = 24. Тһе .99 confidence-interval for the true D is 
19 + 2.58  .24, or from —.43 to .81. This relatively wide range and 
the fact that it runs from minus to plus through zero strengthens our 
confidence in the inference that the true D could well be zero. In 
fact, acceptance of the null hypothesis always means that zero lies 
within the confidence-interval for the true difference. 


TAILED TESTS OF SIGNIFICANCE 


, Under the null hypothesis, differences between obtained means 
(ie, M, — М.) may be either plus or minus and as often in one 
direction as in the other from the true difference of zero, so that in 
determining probabilities we take both tails of the sampling distribu- 
tion (Fig. 47). This two-tailed test, as it is sometimes called, is the 


Most general test of significance. It should always be used when, 
our two groups have conceiv- 


tion with respect to the trait 


(3) TWO-TAILED AND ONE- 


ably been drawn from the same popula 


being measured [see Example (1) above]. 
In many experiments our primary concern js with the direction of 


the difference rather than with its existence in absolute terms.* This 
Situation arises when negative differences, if found, are of no impor- 
tance practically; or when а difference if it exists at all must of neces- 
Sity be positive. Suppose, for example, that we wish to determine the 
increase in vocabulary resulting from additional weekly reading 
Assignments, or want to evaluate the gain in numerical computation 
brought about by an extra hour of drill per day. It is unlikely that 
additional reading will lead to an actual loss in vocabulary. More- 
tive d Lyle V., “Tests of Hypotheses: One-sided vs. Two-sided Alterna- 
,” Psychol. Bull., 1952, 49, 43-46. 
>. 
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over, if drill decreases arithmetic skill it would be the same as 
though it had no effect—in either event we would drop the drill. 
Only an increase as a result of drill, therefore, is of any practical 
interest. 

In cases like these the one-tailed test of significance is appropriate. 
We may illustrate with Example (2). 


Example (2) We know from experience that intensive coach- 
ing increases reading skill. Therefore, if a class has been coached, 
our hypothesis is that it will gain in reading comprehension—fail- 
ure to gain or a loss in score is of no interest. At the end of a school 
year, Class A, which had received special coaching, averaged 5 
points higher on a reading test than Class B, which had received 
no coaching. The standard error of this difference was 3. Is the 
gain significant? 


To evaluate the 5 points gained, i.e., determine its significance, we 
must use the one-tailed and not the two-tailed test. The critical ratio 
is 5/3 or 1.67, and from Table D we find that 10% of the cases in а 
normal distribution lie to the left and right of 1.65, so that 5% (P/ 2) 
lie to the right of 1.65. Our critical ratio of 1.67 just exceeds 1.65 and 
is, therefore, significant at the .05 level. We reject the null hypothe- 
sis, therefore, since only once in 20 trials would a gain as large or 
larger than 5 occur by chance. When a critical ratio is 2.33 (P = .02 
and P/2 = .01) we mark a positive difference significant at the .01 
level. It may be noted that in using the one-tailed test the ехрегі- 
menter sets up the hypothesis he wishes to test before he takes his 
data. This means that the experiment is designed at the outset to 
test the hypothesis; an hypothesis cannot be proposed to fit the data 
after they are in. If in Example (2) we had been interested simply in 
whether Class A and Class B were significantly different in reading 
score, the two-tailed test would have been appropriate. As we have 
seen, the two-tailed test gives us the probability of a mean positive 
difference of 5 póints (A ahead of B), together with the probability of 
a mean negative difference (loss) of 5 points (B ahead of A). This is 
true since under the null hypothesis fluctuations of sampling alone will 
tend to show A-samples better than B-samples, and B better than A, 
about equally often. A difference in favor of either A or B, there- 
fore, is possible and equally acceptable. 

The one-tailed test should be used when we wish to determine the 
probability of a score occurring beyond a stated value. An illustra- 
tion is given in Example (3) below. , 


ж 
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Example (3) In certain studies of deception among school chil- 
dren the scores achieved on tests given under conditions in which 
cheating was possible were compared with scores achieved by com- 
parable groups under strictly supervised conditions. In a certain 
test given under “honest” conditions the mean is 62 and the o is 
10. Several children who took the test under non-supervised con- 
ditions turned in scores of 87 and above. Is it probable that these 
children cheated? 


The mean of 62 is 24.5 score units from 86.5, the lower limit of 
score 87. Dividing 24.5 by 10 we find that scores of 87 and above 
lie at the point 2.450 above the mean of 62. On the assumption of 
normality of distribution, there is less than one chance in 100 that 
а score of 87 or more will appear in the “honest” distribution. While 
scores of 87 and above might, of course, be “honest,” examinees who 
make such scores under non-supervised conditions are certainly open 
to suspicion of having cheated. The one-tailed test is appropriate 
here as we are concerned only with the positive end of the distribu- 
tion—the probability of scores of 87 and above. 


(4) ERRORS IN MAKING INFERENCES 

In testing hypotheses two types of wrong inference can be made 
and must be reckoned with by the experimenter.* What are called 
Type I errors are present when the hypothesis is true but our test of 
significance leads us to believe it to be false; Type II errors arise 
when the hypothesis is false, but our test of significance leads us to 
believe it to be true. Stated in different terms, we make an error of 
Type I if we reject the null hypothesis when it is true—claim signifi- 
cance when none exists; and we commit an error of Type П if we 
accept the null hypothesis when it is false—mark a finding not-sig- 
nificant when a real difference is present. 

Various precautions must be taken to avoid both sorts of erroneous 
inference, A low significance level (P greater than :05, say) increases 
the possibility of Type І errors; and a high significance level (.05 to 
01) renders such erroneous inferences less likely. How this works 
out can perhaps be shown best by a simple example. Suppose that a 
quarter known to us to be a good coin is suspected by an experi- 
menter of a bias in favor of heads} When our experimenter tosses 

* Treloar, Alan E., Elements of Statistical Reasoning (New York: Wiley and 


Sons, 1939 149-151. à 
р MoNenan Or Pejohotogical Statistics (New York: Wiley and Sons, 1949), 
D. 69-71 


_t If a coin is “leaded” or weighted on the “tails” side, the “heads” side, being 
lighter, will tend to appear more often than tails. 


sJ 
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this coin 10 times, it turns up 8 heads and 2 tails. The theoretical 
expectation for а good coin is, of course, 5 heads and 5 tails; and the 
specific question for the experimenter to decide is whether the occur- 
rence of 8 heads represents a “heads” bias—a significant deviation 
from the expected 5 heads. The distribution of heads and tails ob- 
tained wher a single coin is tossed 10 times. is given by expansion 
of the binomial (p+ q)?°, where p = probability of a head and 
q = probability of a tail (non-head). The mean of (p-+p)” is np 
and the SD is \/npq; hence in our example the mean is 5 and the SD 
is (/10:1/2-1/2 or 1.58. A "score" of 8 extends over the interval 
7.5-8.5, so that to determine the probability of 8 or more, the CR we 


wish is n = ог 1.58. (See Fig. 48.) (A problem similar to this 


will be found on p. 252). From Table A we know that 8 or more 
heads, that is, a CR of 1.58, may be expected on the null hy- 
pothesis approximately 6 times in 100 trials.* If our experi- 
menter is willing to accept P = .06 as significant (i.e., set his stand- 
ards low), he will reject the null hypothesis—although it is true. 
That is, he will report the coin to be biased in favor of heads, 
although it is in fact а good coin. 

If our experimenter had set his significance level higher (say .01 or 
even .05) he would have avoided this erroneous inference. Further- 


* This is a one-tailed test (p. 217) because our experimenter's hypothesis was 
that the coin is biased in favor of heads. 


ye 
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more, had he increased the number of tosses of the coin from 10 to 
100 or even 500, he might have avoided his wrong inference, as heads 
and tails in a good coin will tend to occur equally often. Increas- 
ing the experimental data gives the null hypothesis a chance to assert 
itself (if true) and guards against freak results. We should not be 
willing to reject a null hypothesis too quickly, as in so doing we 
must assert the existence of a real difference—often a heavy 
responsibility. 

In direct contradiction to what happens in the case of Type I 
errors, the possibility of drawing erroneous inferences of Type IT 
(acceptance of the null hypothesis when false) is increased when we 
set very high levels of significance. This can be shown by reference 
to the coin example above—with a change in conditions. Suppose 
that a quarter which is known to us to be biased in favor of heads is 
also suspected by an experimenter of bias in favor of heads. This 
coin is tossed 10 times and shows, as did the coin before, 8 heads and 
2 tails. From the data above, on page 220, we know that in a good 
coin 8 or more heads can be expected by chance 6 in 100 times—that 
P= 06. Hence, if our experimenter sets .01 as his level of signifi- 
cance (or even .05) he will accept the null hypothesis and mark 
his result “not significant” although the coin is now actually 


biased. 


How сап we guard against both of these types of erroneous infer- 


ence? Perhaps the wisest course is first to demand more evidence, 
that is, give the data a chance to refute (or fail to refute) the null 
hypothesis. Additional data, further repetition of the experiment, 
and better control will often make possible a definite conclusion. If 
а coin is biased toward returning heads, this bias will continue to 
Cause more heads than tails to appear in further tosses. For example, 
if the ratio of 8 heads to 2 tails in the 10 tosses described in the last 
Paragraph holds consistently, we shall get 80 heads and 20 tails in 
100 throws. The critical ratio for 100 tosses will be 5.9* (as com- 
pared with 1.58 for 10 tosses), and the probability is far less than | 

that 80 heads is a random fluctuation from the expected 50 heads. 
Our experimenter would correctly mark this result very significant— 


іе.) significant beyond the .01 level. 
* When n = 100, р = 50, а = 50: 


M = np = 50 
— НЕ 
o = Vang = /1@ X 1/2 X 1/2 — 5 


ср- = 50 —59 
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Setting a high level of significanee will tend, then, to prevent 
Type I errors but will encourage the appearance of Type II errors. 
Hence it appears that an experimenter must decide which kind of 
wrong inference he would rather avoid, as apparently he can pre- 
vent one type of error only at the risk of making the other more 
likely. In the long run, errors of Type I (rejecting a null hypothesis 
when true, by marking a non-significant difference significant) are 
perhaps more likely to prove serious in a research program in psy- 
chology than are errors of Type II. If an experimenter claims а 
significant finding erroneously, for instance, the fact that it is а 
positive result is likely to terminate the research, so that the error 
persists. When a high level of significance is demanded (.01, say) 
we may feel assured that significance will be claimed incorrectly not 
more than once in 100 trials. 

Errors of Type II (accepting the null hypothesis when false, i.e., 
when a real difference exists) must be watched for carefully when 
the experimental factor or factors are potentially dangerous. Thus, 
if one is studying the psychological effects of а drug suspected of 
inducing rather drastic emotional and temperamental changes, an 
error of Type II might well prove to be disastrous. Fortunately, the 
fact that a negative finding is inconclusive and often unsatisfactory 
may lead to further experimental work, and thus obviate somewhat 
the harm done by Type II errors. Especially is this true when the 
problem is important enough further to challenge the investigator. 

For many years it was customary for investigators in experimental 
psychology to demand critical ratios of 3.00 or more before marking 
a difference significant. This extremely high standard almost cer- 
tainly caused the null hypothesis to be accepted more often than it 
should have been—a Type II error on the side of conservatism. As 
a general rule it is probably wise to demand a significance level of at 
least .01 in most experimental research, i.e., to risk Type II errors 
by preventing those of Type I. But the .05 level is often satisfactory, 
especially in preliminary work. 


(5) RELIABILITY OF THE DIFFERENCE BETWEEN MEANS IN SMALL 
INDEPENDENT SAMPLES 
When the N’s of two independent groups are small (less than 30, 
say) the SH of the difference between means should depend upon 
SD’s calculated by the formula SD = mn and the degrees of 


freedom in the two groups must be considered. Table D may then 


e 


----- SS 
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be used conveniently to test the significance of 5* which is the E 
› ну 


propriate critical ratio to be used with small samples. An examp 
will illustrate the procedures. 


Example (4) An interest test is administered to 6 boys in a 
Vocational class and to 10 boys in a Latin class. Is there a sig- 
nificant difference in mean score between the two groups? 


Scores are as follows: 


Vocational Class Latin Class 
М-6 No = 10 
Scores (Xi) z, 12 Scores (Xo) то 2,2 
28 —2 4 20 -4 16 
35 5 25 16 --8 64 
32 2 4 25 1 14 
24 —6 36 84 10 100 
26 -4 16 20 --4 16 
85 5 25 28 4 16 
6| 180 110 81 7 49 
I 24 0 0 
M, — 30 27 30559 
15 —9 81 
10 | 240 352 
Мне, М» = 24 
N,—1— 9 F 
14 


2 
SD (ors) = BORED = 574 by (52) 


SDp =5.74 pas -574Х 5163 =2.96 Бу (53) 


(30—24) —0 _ 
reor pamm 


For 14 ау, the .05 level (Table D) is 2.14; and the .01 level is 2.98. 


Тһе mean of the interest scores made by the 6 boys in the Voca- 
tional elass is 30, and the mean of the interest scores made by the 
10 boys in the Latin class is 24. The mean difference of 6 is to be 
tested for significance. When two examples are small, as here, we 
Bet a better estimate of the "true" SD (c in the population) by pool- 


жі iti io in which a more exact estimate of the op is used. The 
sampling Кри: t is not normal when № is small (less than 50, say). 
Lisa CR; but all CR’s are not ¢’s (see p. 215). 
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ing the sums of squares of the deviations taken around the means of 
the two groups and computing a single SD.* The justification for 
pooling is that under the null hypothesis no real mean difference ex- 
ists as between the two samples, which are assumed to have been 
drawn from the same parent population. We have, therefore, only 
one с (that of the common population) to estimate. Furthermore, by 
increasing N we get a more stable SD based upon all of our cases. 
The formula for computing this “pooled” SD and the formula for the 
SE of the difference are as follows: 


У(Х. — М)2--Х(Х. 
(Ni — 1) + (N2— 1) 


(SD when two small independent samples are pooled) | 


SD = (9) Ж 


= Ni 4N: 
SE, = SD. EXAMS (53) 


(SE of the difference between means in small independent samples) 


In formula (52), E(X; — M)? = Zx,? is the sum of the squared 
deviations around the mean of Group 1; and E(Xs — M)? = Ex? 
is the sum of the squared deviations around the mean of Group 2. 4 
These sums of squares are combined to give a single SD. In Exam- | 
ple (4) the sum of squares in the Vocational class around the mean 
of 80 is 110; and in the Latin class the sum of squares around the 
mean of 24 is 352. The df аге (№; — 1) = 5, and (№ — 1) -94 By | 


formula (53), therefore, the SD = Hos or 5.74, ‘This SD | 
Serves as a measure of variability for each of the two groups. “Thus | 
the SEy, = and the SEx, = 55 (ву formula (89), p.182]. || 

Combining these two SE's by formula (51) we find that SE» = | 
Sz + GS = 5.74 |9 ог 2.96. Formula (53) combines the ) 


two 8 ув enabling us to calculate SE; in one operation. 
6 Я : 
іт 206 9! 2.03; and the df in the two groups (namely, 5 and 9) 


are combined to give 14 df for use in inferring the significance of the 


* The SD so computed is subject to a slight negative bias, which is negligible” 
when М > 20, say. See Holtzman, W. H., “Тһе Unbiased Estimate of the Pop- 
ulation Variance and Standard Deviation,” Amer. Jour. Psychol, 1950, 63, 
615-617. 

1 1 df is “used up" in computing each mean (p. 193). 


7, 
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mean difference. Entering Table D with 14 df, we get the entries 
2.14 at the .05 and 2.98 at the .01 levels. Since our ¢ does not reach 
the 05 level, the obtained mean difference of 6 must be marked 

non-significant.” 

A second example will illustrate further the use of levels of sig- 
nificance when samples are small. 

Example (5) Оп an arithmetic reasoning test 31 ten-year-old 
boys and 42 ten-year-old girls made the following scores: 


Mean SD N 
Boys: 40.39 8.69 81 
Girls: 35.81 8.33 42 


Is the mean difference of 4.58 in favor of the boys significant? 
By formula (52) we find 
2 25 
Е? А, (8.69)? X Sr (8.33)? X41 ЕСТІ 


SD * 
And by formula (53), 


* 8142 _ 
БЕ = 8.48, Із xa 720 


t is 4.58/2.01 or 2.28 and the degrees of freedom for use in testing the 
Significance of the mean difference are 30-- 41 or 71. Entering 
Table D with 71 df we find і-епігіев of 2.00 at the .05 and of 2.65 at 
the .01 levels. The obtained £ of 2.28 is significant at the .05 but not 
at the .01 level. Only once in 20 comparisons of boys and girls on 
this test would we expect to find a difference as large or larger than 
4.58 under our null hypothesis. We may be reasonably confident, 


therefore, that boys do better than girls on this test. 
3. The reliability of the difference between two correlated means 


(1) тне SINGLE GROUP METHOD 

The last section dealt with the problem of determining whether the 
difference between two means is significant when these means repre- 
Sent the performance of independent groups—boys and girls, Nor- 
Wegians and Belgians, and the like. A closely related problem is con- 
cerned with the significance of the difference between correlated 
Means obtained from the same test administered to the same group 


2 
*SD? = at ту} hence Ул? = SD? X (N — 1). 


"A. 
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upon two occasions. This experimental design is called the "single 
group" method. Suppose that we have administered а test to а 
group of children and two weeks later have repeated the test. We 
wish to measure the effect of practice or of special training upon the 
second set of scores; or to estimate the effects of some activity inter- 
polated between test and retest. In order to determine the signifi- 
cance of the difference between the means obtained in the initial and 
final testing, we must use the formula 


% SE = VO x, F сём, — 271203, Ong (54) 


(SE of the difference between correlated means) 


in which биі and oy, are the SH’s of the initial and final test means, 
and 71» is the coefficient of correlation between scores made on initial 
and final tests.* An illustration will bring out the difference between 
formula (51) and formula (54). 


Example (6) At the beginning of the school year, the mean 
score of a group of 64 sixth-grade children upon an educational 
achievement test in reading was 45.00 with a o of 6.00. At the end 
of the school year, the mean score on an equivalent form of the 
same test was 50.00 with a o of 5.00. The correlation between 
scores made on the initial and final testing was .60. Has the class 
made significant progress in reading during the year? 


We may tabulate our data as follows: 


Initial Final 

Test Test 
No. of children: 64 64 
Mean score: 45.00 (M4) 50.00 (M5) 
Standard Deviations: 6.00 (с) 5.00 (02) 
Standard errors of means: 75 (си) 63 (см) 
Difference between means: 5.00 ; 
Correlation between initial and final tests: .60 


Substituting in formula (54) we get 
SEp = \/(.75)2-F (63)? — 25760 xX .75 X .63 = .63 


The t-ratio is 5.00/.63 or 7.9. Since there are 64 children there are 
64 pairs of scores and 64 differences,+ so that the df becomes 64 — 1 or 


.* The correlation between the means of successive samples drawn from а 
given. population epee the correlation between test scores, the means of which 
are being compared. 

11 df is lost:since SE» is computed around the mean of the distribution of 
differences (р. 192). 


ж 


`+ 
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63. From Table D the ¢ for 63 df is 2.66 at the .01 level. Our t of 7.9? 


is far greater than 2.66 and hence is very significant. It seems clear, 
therefore, that this class made substantial progress in reading over 


the school year. 


. When groups are small, a procedure called the “difference-method” 
is often to be preferred to that given above. The following example 


will serve as an illustration: 


Example (7) Twelve subjects are given 5 successive trials upon 
a digit-symbol test of which only the scores for trials 1 and 5 are 
shown. Is the gain from initial to final trial significant? ї 


Trial 1 


50 
42 
51 


Trial yos 2 z? 
62 12 4 16. 
40 — 2 —10 100 
61 10 2 4 
35 9 1 1 
30 — 1% —13 169 ' 
52 10 2 4 | 
68 8 0 0 
51 10 2 4 
84 14 6. 86 
63 8 0 g 
72 10 2 4 
50 12 4 16 

668 12/96 354 
8 

8.0 
354 _ 5.67 e 
1 erue" 


тоша the column of differences between pairs of scores, the mean 
ifference is found to be 8, and the SD around this mean (SDp) by 


the formula SD = Manca 


is 5.67. On our null hypothesis the true 


‘ferenco between the means of Trials 5 and 1 is 0, so that we must 
St our obtained mean gain of 8 against this hypothetical zero gain. 


Ў 
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оу. 
The SZ of the mean difference SExy, = 25 is 1.64 and Є ”) \ 


1 SEs, 
is 4.88. Entering Table D with 11 (12 — 1) degrees of freedom, we 
find t-entries of 2.20 and 3.11 at the .05 and at the .01. levels. Our t 
of 4.88 is far above the .01 level and the mean difference of 8 is obvi- 
ously very significant. 

If our hypothesis initially had been that practice increases test 
score, we would have used the one-tailed test. The probability of a 
positive difference (gain) of 8 or more on the null hypothesis is quite 
remote. In the one-tailed test, for 11 df the .05 level is read from the 
-10 column (P/2 = .05) to be 1.80 and the .01 level from the .02 
column (P/2 = .01).is 2.72. Our ¢ of 4.88 is much larger than the .01 
level of 2.72 and there is little doubt but that the gain from Trial 1 to 
Trial 5 is significant. 

The result found in Example (7) may be checked by the single 
group method. By use of formula (27), p. 145, the r between Trials 1 
and 5 is found to be .944. Substituting for ry» (viz., .944), for or, 
(8.65) and for oy, (4.55) in formula (54) we get a ор of 1.63 which 
checks SEy, within the error of computation. The “difference- 
method” is quicker and easier to apply than is the longer method of 


calculating SE's for each mean and the SE of the difference, and is - 


to be preferred unless the correlation between initial and final scores 
is wanted. ` 


(2) THE METHOD OF EQUIVALENT GROUPS: MATCHING BY PAIRS 


Formula (54) is applicable in those experiments which make use 
of equivalent groups as well as in those using a single group. In the 
method of equivalent groups the matching is done initially by pairs 
so that each person in the first group has a match in the second group. 
This procedure enables us to set off the effects of one or more experi- 
mentally varied conditions (experimental factors) against the ab- 
sence of these same variables (control). The following problem is 
typical of many in which the equivalent group technique is useful. 


Example (8) Two groups, X and Y, of seventh-grade children 
are paired child for child for age and score on Form A of the Otis 
Group Intelligence Seale. Three weeks later, both groups are given 
Form B of the same test. Before the second test, Group X, the ex- 
perimental group, is praised for its performance on the first test and 
urged to try to better its score. Group Y, the control group, is given 
the second test without comment. Will the incentive (praise) cause 
the final scores of Group X and Group Y to differ significantly? 


: 7; ; 
Д 
/ 


E 
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The relevant data may be tabulated as follows: 


Ses Соза 

А roup Х roup Y 

ag of children in each group: 72 ui 

Ж joan scores on Form А, initial test: 80.42 80.51 

M on Form А, initial test: 23.61 23.46 

=~ scores on Form B, final test: 8863 (Mj) 8324 (Ma) 
à оп Form B, final test: 2436 (o) 21.62 (o2 
iain, M; — Ms: 5.39 

Standard errors of means, final tests: 2.89 2.57 


Correlation between final scores (experimental and control groups) — .65 
e control and experimental groups in 


The means and o's of th 
cal showing the original pair- 


Form A (initial test) are almost identi 
ing of scores to have been quite satisfactory. The correlation be- 
tween the final scores on Form B of the Otis Test is caleulated from 
the paired scores of children who were matched originally in terms of 
initial score.* 

Тһе difference between the means on t 
(88.63 — 83.24). The SE of this difference, ор, 
(54) to be 


сь = V (2.89)? + 


The ¢-ratio is 5.39/2.30 or 2.34; 
(72 — 1) or 71 degrees of freedo 


he final test is 5.39 
is found from formula 


(257)? — 2X 65 X 2.89 X 2.57 — 2.30 


and since there are 72 pairs, there are 
m. Entering Table D with 71 df we 
find the Ёз at .05 and .01 to be 2.00 and 2.65, respectively. The given 
difference is significant at the .05 but not at the .01 level; and we may 
feel reasonably certain that the experimental and control groups 
differ in their final mean scores on Form B of the Otis Test. 

] It is worth noting that had no account been taken of the correla- 
tion between final scores on Form B [if formula (51) had been used 
instead of (54) |, op would have been 3.87 instead of 2.30. ¢ would 
then have been 1.39 instead of 2.34 and would have fallen consider- 
ably below the .05 level of 2.00. In other words, a significant finding 
Would have been marked “not significant.” Evidently, it is impor- 
tant that we take account of the correlation between final scores— 
especially if it is high. 

When т = .00, formula 
are then independent or uncorrelat 
9» from formula (54) is smaller than the бр 


(54) reduces to (51) since group means 
ed. Also, when r is positive, the 
from (51) and the larger 
m equivalent groups 
a od is analogous to the correl ў final scores in the 
gle group method. In equivalent group: е experimental and 
the initial scores furnish the control. 
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the plus т the greater the reduction in ор by use of (54). For a given 
difference between means, the smaller the op the larger the ¢ and the 
more significant the obtained difference. The relative efficiency ob- 
tained by using a single group or equivalent groups as compared with 
independent groups can be determined by the size of the т between 
final scores, or between initial and final scores. Тһе correlation coeffi- 
cient, therefore, gives a measure of the advantage to be gained by 
matching. 

If r is negative, formula (54) gives a larger op than that given by 
formula (51). In this case, the failure to take account of the correla- 
tion will lead to a smaller op and a ¢ larger and apparently more 
significant than it should be. 

One further point may be mentioned. If the difference between 
ihe means of two groups is significant by formula (51) it will, of 
course, be even more significant by formula (54) if r is positive. 
Formula (51) may be used in a preliminary test, therefore, if we can 
be sure that the correlation is positive. Тһе correlation between 
initial and final score is usually positive, though rarely as high as 
that found in Example (8). 


| (8) GROUPS MATCHED FOR MEAN AND SD 


^ When it is impracticable or impossible to set up groups in which 
subjects have been matched person for person, investigators often 
resort to the matching of groups in terms of mean and в. The match- 
ing variable is usually different from the variable under study but is, 
in general, related to it and sometimes highly. No attempt is made 
to pair off individuals and the two groups are not necessarily of the 
same size, although a large difference in N is not advisable. 

Іп comparing final score means of matched groups the procedure 
is somewhat different from that used with equivalent groups.” Sup- 
pose that X is the variable under study, and Y is the function or 
variable in terms of which our two groups have been equated as to 
mean and SD. Then if rz, is the correlation between X and Y in the 
population from which our samples have been drawn, the SH of the 
difference between means in X is 


SE py Lan = 9% = (i, + ex, ) (1 = 23) ` (55) 


(SE of the difference between the X means of groups matched 
jh for mean and for SD in Y) 


i “+ Wilks, S. S, “The Standard Error of the Means of ‘Matched’ samples,” 
Jour. Educ. Psychol., 1931, 22, 205-208. 


% 
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An example will illustrate the procedure. 


Example (9) Тһе achievement of two groups of first-year high- 
school boys, the one from an academic, the other from a technical 
high school, is compared upon & Mechanical Ability Test. The two 
groups are matched for mean and SD upon a general intelligence 
test so that the experiment becomes one of comparing the mechani- 
cal ability scores of two groups of boys of “equal” general intelli- 
gence enrolled in different curricula. Data are as follows: 


Academic "Technical 
No. of boys in each group: 125 | 137 
Teans on Intelligence Test (Ү): 102.50 102.80 
We on Intelligence Test (У): 33.65 31.62 
eans on Mechanical Ability Test (X): 5122 ЗА 


os оп Mechanical Ability Test CO: 


Correlation between the General Intelligence чу 
est for first-year high-school boys is 30. 
Ma = Ме = 5438 — 5142 = 2.96 


ашат ан ү н з 
з т 
By (55) o2 = М 62% + ane уа — 30°) A 
= 79 
p= 296 = 375 
79 


est and the Mechanical Ability 


The difference between the mean scores in the Mechanical Ability 
Test of the academic and technical high-school boys is 2.96 and the 
Sp is 79. The t is 2.96/.79 or 3.75; and the degrees of freedom to be 
Used in testing this Ё аге (125 — 1-37 —1)— 1, or 259.* We 


must subtract the one additional df to allow for the fact that our 


groups were matched in variable Y. The general rule (p. 193) is that 
| df is subtracted for each restriction imposed upon the observations, 
Le., for each matching variable. { 
Entering Table D with 259 df, we find that our ¢ of 3.75 18 langer 
than the entry of 2.59 at the 01 level. The obtained difference in Oe 
(mechanical ability), therefore, though small, is highly significant, 
and boys in the technical high school are reliably better on the 
Mechanical Ability Test than are boys of “equal” general intelligence 
їп the academic high school. 
The correlation term must be 
When two groups have been ma 
ability is restricted in all fun 
Variable, Height and weight, for exam 


9-year-old boys. Therefore, if a group of 9 
d by using £ instead of CR. 


introduced into formula (55) because 
tched in some test or tests their vari- 
ctions correlated with the matching 
ple, are highly correlated in 
-year-old boys of the same 


* When dj = 250, little is to be gaine 
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or nearly the same height is selected, the variability in weight 4 
of these children will be substantially reduced as compared with 
9-year-old boys in general. When groups are matched for several 
- variables, e.g., age, intelligence, socioeconomic status, and the like, 
. and compared with respect to some correlated variable, the correla- 
tion coefficient in formula (55) beconies a multiple coefficient of 
correlation (p. 395). When r,, = 00, (55) reduces to (51)—our 
groups are independent and unrestricted by the matching variable. 
Groups matched for mean and с and equivalent groups in which 
individuals are paired as to score have been widely used in a variety 
of psychological and educational studies. Illustrations are found in | 
experiments designed to evaluate the relative merits of two methods 
of teaching, the effects of drugs, e.g., tobacco or caffeine, upon effi- 
ciency, transfer effects of special training, and the like. Other tech- 


niques useful in assessing the role of experimental factors are 
described in Chapter 10. 


4. The reliability of the difference between uncorrelated medians 


The reliability of the difference between two medians obtained 
from independent samples may be found from the formula 


Dyan ОГ Cargn, — мал, = \/о®уу, i (56) 
ы Капу Mäng 


(SE of the difference between two uncorrelated medians) 


When medians are correlated, the value of тіз cannot be deter- 
mined accurately and the reliability of the median cannot be readily 
computed. When samples are not independent, therefore, it is better 
procedure to use means instead of medians, 


Il. The Significance of the Difference between o's 


|. The reliability of the difference between two standard deviations 
(1) SE or A DIFFERENCE WHEN 0% ARE UNCORRELATED 


In many studies in psychology and education, differences in vari- 
ability which appear among groups are a matter of considerable 
importance. The student of race, Sex, and experimentally induced 
differences is oftentimes more interested in knowing whether his 
groups differ significantly in SD than in knowing whether they differ 


^ > 
=, 
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in mean achievement. And the educational psychologist who is in- 
vestigating a new way of teaching arithmetic may want to know 
whether the new method has led to changes in variability greater 
than those brought about by the old method. 

When samples are independent, 1.е., when different groups are 
studied, or when tests given to the same group are uncorrelated, the 
reliability of a difference between two o’s may be found thus: 

Gp, OT 6, — 6, 7 М o о. (57) 

(БЕ of the difference between uncorrelated o's when N’s are large) 


where c, is the SE of the first 6 and 6, is the SE of the second 6 
3 2 


(р. 195). 
By way of illustration, we may apply this formula to the data of 


the Norwegians and Belgians on page 214. The c of the Norwegians' 
Scores on the combined scale was 2.47; of the Belgians' scores on the 
same test, 2.42. Is this Very small difference in variability: signifi- 
cant? Calling the c of the Norwegians’ scores 01 and the c of the 
Belgians’ scores o», we have i 
711X241 
— T ES 71 by (43 
6, = fell 0 y (43) 
Ax292 1g "Т 


CAI Sag. D 
ас vV 
op, = (071)? + (.151)* = 167 or .17 (to two decimals) 


"The obtained difference іп the o's is 05 (247 — 242), and CR is 
05/17 or 30. On the null hypothesis (91 ^ 0), this СЕ (Table 
D, last line), is far short of 1.96, the .05 level. Às we suspected, the 
obtained difference is clearly not significant; and there is no reason 
to suspect that the two groups are not about equally variable. 

Formula (57) is adequate for testing the significance of the differ- 
ence between two uncorrelated SD's when N’s are large (greater 
than 50, вау). But formula (57) is not accurate when N’s are small, 
as the SD’s computed from small samples drawn at random from 
the same normal population will exhibit a skewed distribution 
around the population б. (See Figure 47 for normal sampling dis- 
tribution of means.) Instead of testing the difference between two 
SD's obtained from small independent samples, therefore, by 
formula (57) we divide the larger of the two variances (SD?) by the 
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smaller and test the significance of this ratio, called F,* by the óne- 
tailed tést. We then double the probability (P) so found, in order to 
test the general (null) hypothesis, namely, that the two variances do 
not differ. -~ 

We may illustrate the method of using the F-ratio with Example 
(4), page 223, in which №; = 6 and М» = 10, and the sums of squares 
around the two means ате Ez,? = 110 and Хт»? = 352, respectively. 
The first variance (viz, SD;?) is 110/5 or 22; and the second vari- 
ance(SD;?) is 852/9 or 39.1. Тһе F-ratio found by dividing the 
larger by the smaller variance is, then, 39.1/22 or 1.78; and entering 
Table F with n,—9 (df of larger variance) and т = 5 (df of 
smaller variance), we get the two entries 4.78 and 10.15. As given in 
the table, the first of these is the F-ratio significant at the .05 level, 
and the second is the F-ratio significant at the .01 level. However, 
since we have used the one-tailed test (have divided only the larger 
variance by the smaller), these two F-ratios, viz., 4.78 and 10.15, 
really represent the .10 and the .02 levels of confidence (see p. 217). 
Our F of 1.78 falls far below the smaller of these values (namely, 
4.78) and hence is not significant at the 10 level, much less at the 
05 or .01 levels. There is no evidence, therefore, that the two groups 
really differ with respect to variability. 


(2) SE or A DIFFERENCE WHEN 6/8 ARE CORRELATED 

When we compare the o’s of the same group upon two occasions or 
the o’s of equivalent groups on a final test, we must take into account 
possible correlation between the o’s in the two groups being com- 
pared. The formula for testing the significance of an obtained dif- 
ference in variability when SD’s are correlated is 


aie Sy Жы ONE 
бр, = 19, +0, — 21150, ,, (58) 


(SE of the difference between correlated o’s when N’s are large) 
where 5 and o, are the SE's of the two SD’s and 7712 is the square 
of the coefficient of correlation between scores in initial and final 


tests or between final scores of equivalent groups.T 
Formula (58) may be applied to the problems on page 226 by 


% See pages 278-281 for explanation of the F-ratio; and page 429 for the table 


F. 
1 The correlation between the SD's of samples drawn from a given popula- 
tion equals the square of the coefficient of correlation between the test scores, 


the SD's of which are being compared. m 
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way of illustration. In the first problem, the SD of 64 sixth-grade 
children was 6.0 on the initial and 5.0 on the final test. Istthere a 
Significant drop in variability in reading after a year’s schooling? 
Putting о; = 6.0 and o2 = 5.0, we have 4 
i 1X 6.0 _ 
йт 
71 X 5.0 
5, = Vea -- 44: к. 

The coefficient of correlation between initial and final scores is .60, 
во that 72,9 = .36. Substituting for 77 and the o,’s in formula (58) 
we have 

Gp, = \/(53)2-F (44)? — 2X 36 X -53 X 44 = .55 
s 1.0 and the SE of this difference 


53 by (43) 


The difference between the two o'si 


is .55. Therefore, on the null hypothesis of equal o's, t = ous 


| or 1.80. Entering Table D with 63 df, we find £ at the .05 level to be 
| 2.00. The obtained £ does not quite reach this point, and there is no 
22 Тоавоп to suspect a true difference in variability between initial and 
final reading scores. 


In the equivalent groups problem on page 228, the SD of the experi- 


mental group on the final test was 24.36 and the SD of the control 
group on the final test was 21.62. The difference between these 
SD's is 2.74 and the number of children in each group is 72. Did 
the incentive (praise) produce significantly greater variability in the 
€xperimental group as compared with the control? Putting 
91 = 24.36, and o» = 21.62, we have 
= 71 X 24.36 
И 9, т ут 
1X 21.62 
te, » v72 
The r between final test scores in the experimental and control 
groups is .65 and r2», therefore, is .42. Substituting for т? and the 
Wo SE's in formula (58) we have 


= 2.04 by (43) 


= 8h 


op, = VOF (181)? — 2 X 42 592017 181 


: = 208 
Dividing 2.74 by 2.08, our t is 1.32; and for 71 degrees of freedom 
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this t falls well below the .05 level of 2.00. There is no evidence, 
therefore, that the incentive increased variability of response to the 
test. 


IIl. The Significance of the Difference between Percentages 
and Correlation Coefficients 


1. The reliability of the difference between two percents 


(1) SE or THE DIFFERENCE WHEN PERCENTS ARE UNCORRELATED 
On page 196, the formula for the SH of a percentage was given as 


SE 


Q = (1 — P), and N is the size of the sample. One of the most useful 
applications of the SE formula is in determining the significance of 
the difference between two percents. In much experimental work, 
especially in social and abnormal psychology, we are able to get the 
percent occurrence of a given behavior in two or more independent 
samples. We then want to know whether the incidence of this be- 
havior is reliably different in the two groups. The following/problem 
which repeats part of Example (1), page 196, will provide an illustra- 
tion. 


= [РО where Р = percent occurrence of the observed behavior, 


T 


Example (1) Ina study of cheating * among elementary-school 
children, 144 or 41.4% of 348 children from homes of good socio- 
economie status were found to have cheated on various tests. In 
the same study, 133 or 50.2% of 265 children from homes of poor 
socioeconomic status also cheated on the same tests. Is there а 
true difference in the incidence of cheating in these two groups? 


Let us set up the hypothesis that no true difference exists as 
between the percentages cheating in the two groups and that, with 
respect to cheating, both samples have been randomly drawn from 
the same population. A useful procedure in testing this null hypoth- 
esis is to consider Рі (41.4%) ‘and P, (50.2%) as being inde- 
pendent determinations of the common population parameter, P; and 
to estimate P by pooling P, and Р» (see p. 224). A pooled estimate 
of P is obtained from the equation: 


_ МР, + NoP2 
№, № 


0 being, of course, (1—P). 


* Data from Hartshorne, H., and May, M. A, Studies in Deceit (New York: 
Macmillan, 1928), Book II, p. 161. 


RELIABILITY OF DIFFERENCE BETWEEN MEANS AND OTHER MEASURES * 237 


The estimated percentages, P and Q, may now be put in formula y 
(59) to give the SE of the difference between Р; and Ps. 


90у, AOR = Мор, ot сір, (59) 


1 1 
- Ares] 


(SE of the difference between two uncorrelated percentages) 
348 X 41.4 -+ 265 X 50.2 
x а; X or 45.296 


In the present example, P — - 
" 348 +265 
and Q = (1— P) or 54.8%. Substituting these two values in (59) 
| we get 
i б {45.2588 : + T = 4.06% 
-ра = E 8| —_+— | = © 
ix 348 | 265 

The difference between the two percents Р, and Ps is 8.8% 

E pP,—P)-— 
(50.2 — 41.4) ; and dividing by 4.06 (ся - P) we get а 

1-52 


CR of Y7. Entering Table D, last line (there are 611 df), we find 
387 that ош Сехсеейв 1.96 (.05 level) but does not reach 2.58 (.01 
А level), % NS. be reasonably confident, therefore, that our two 
Sroups do not «ome from a common population and that the occur- 
тепсе ofscheating in the two groups js reliably different. 
WHEN PERCENTS ARE CORRELATED 


(2) SE оғ THE DIFFERENCE 
y be, and usually are, corre- 


Responses recorded in percentages ma 
lat ed when individuals have been paired or matched in some at- 
tribute; or when the same group gives answers (e.g, “Yes”—‘No”) 
to the same questions or jtems. To illustrate with an example: 
Example (2) А large group of veterans (250 *) answered as fol- 


lows the two questions: 


1. Do you have a great 
ж Ате you troubled wit 
in a crowd? 


many bad headaches? Yes 150 No 100 


h fears of being crushed 
А Үеѕ 125 Хо 125 


#1 #1 
Ҡо Үев Хо Yes 
(b) (a) Е 
Үеѕ 95 100 | 125 Yes 10% | 40% 50% 
i ie (d) (c) 
с 
No 75 50 |195 Хо 50% | 90% 50% 
250 40% .60% 100% 


150 250 


ж 5 
The data have been simplified for illustrative purposes. 


238 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


The data in the 22 table on the left show the number who 
answered “Yes” to both questions, “Хо” to both questions, “Yes” to 
one and “No” to the other. In the second diagram (on the right) 
frequencies are expressed as percents of 250. The letters a, b, c, and d 
are to designate the four cells (p. 363). We find that a total of 60% 
answered "Yes" to Question 1, and that a total of 50% answered 
“Yes” to Question 2. Is this difference between the questions sig- 
nificant? 

The general formula for the significance of the difference between 
two correlated percents is 


Op, — Po = моь; + оёр, — 27рурьбр; Ops (60) 
(SE of difference between two correlated percents) 


in which r between the two percents is given by the phi-coefficient 
(p. 367), a ratio equivalent to the correlation coefficient in 2X 2 
tables. 

If P; and P have been averaged in order to provide an estimate of 
P, the population parameter, formula (60) becomes 


Op, — py = \/20"p (1 — TPP.) (61) 


(SE of the difference between two correlated percents when 
P is estimated from P, and P3) 


In example (2), P, — 60% and P. — 50%, so that P — 55% and 
Q — 45%. Substituting in (61) we have that 


2 X .55 X 45 
N 250 


= .0342 


Тһе obtained difference of .10 (.60 — .50) divided by .034 gives 8 
СВ of 2.94. From Table D, last line, we find that this critical ratio 
exceeds 2.58, the .01 level. We abandon the null hypothesis, there- 
fore, and conclude that our groups differed significantly in their 
answers to the two questions. 

А simpler formula than (61) which avoids the calculation of the 
correlation coefficient may be used when P has been estimated from 
Р, and Р» under the null hypothesis. This formula } is 


Op, — py = 


(1 — .408) * 


* The phi-coefficient of 408 was found from formula (93), page 367. А 
t McNemar, Q., "Note on the Sampling Error of the Difference bere 
Correlated Proportions or Percentages.”. Psuchometrika, 1947. 12, 153-157. 
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(5 
op = 4 C (62) 


(SE of the difference between two correlated percentages) 


In example (2) we read from the second diagram that c — 20% 
and b — 10%, N being 250. Substituting in (62) we have 


ы Паок 
== 79850 — = .084 


which checks the result obtained from (61). 


2. The reliability of the difference between two r's 


A useful and mathematically exact method of determining the SE 
of the difference between two 778 requires that we first convert the 
r’s into Fisher's z-function. The significance of the difference be- 
tween two z's is then determined. The formula for the SE of the 
difference between two z's is 


(63) 


(SE of the difference between two z coefficients) 


1 * 
N — 3) 
The following example will illustrate the procedure. 


Example (8) Тһе” between intelligence and achievement in the 
freshman class of College A is .40, for № = 400. And the r between 
intelligence and achievement in the freshman class of College B is 


where o, = 


У/ 50 for М--600. Is the relationship between intelligence and 
achievement higher in College B than in College A? 
"a From Table C we read that r's of .40 and .50 correspond to z's of 


42 and .55, respectively. If we put №, = 400 and Nz = 600, we have 
Оп substituting in (63) 


1 1 
ба — 22 = 4 (400 —3) ^ (600— 3) 
| 7 = .065 
The two correlated variables take away 2 degrees of freedom; and the 


transformation into z adds another restriction. Hence we subtract 3 from each 
(see p. 193). 


к 
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Dividing .13 (.55 — .42) by .065, we obtain a CR of 2.00. This СЕ: 


exceeds slightly the value 1.96 and hence is significant at the .05 
level. Based on the evidence we have, the т = .50 in College B is 
reliably higher than the r = .40 in College A. 

Use of the z transformation for т is especially useful when т are 
very high, as the sampling distributions of such т?з are known to be 
skewed—often badly so. To illustrate, suppose that r between two 
achievement tests is .87 in Grade 6 (№; = 50) and that the r between 
the same tests is .72 in Grade 7 (Үз = 65). Is there a significant 
difference between these two r's? 

From Table C we find that r’s of .87 and .72 yield z's of 1.33 and 
91, respectively; and substituting №, and Ne in formula (63) we 


have 
= EN MEL. 
Т) 


= 193 


Dividing 42 (1.33 — .91) by .193 we get a CR of 2.18, well above 
the .05 level of 1.96 but below the .01 level of 2.58. We may discard 
the null hypothesis, therefore, and mark the difference between our 
r’s significant at the .05 level. 

Measurement of the significance of the difference between two 78 
obtained from the same sample presents certain complications, as "8 
from the same group are presumably correlated. Formulas for com- 
puting the correlation between two correlated 776 are not entirely 
satisfactory and there is no method of determining the correlation 
between two z's directly. Fortunately, we may feel sure that if the 
"в are positively correlated in our group, and the CR as determined 
by the SE from (63) is significant, that the CR would be even more 
significant if the correlation between the 775 were known. 

The z-transformation can be usefully employed when r's which 
differ widely in size are to be averaged or combined (p. 198). 


IV. The Significance of Deviations from Normality 


Distributions which show deviations from the normal form are 


said to exhibit skewness or kurtosis or both. Skewed distributions - 


are asymmetrie or off-center—shifted to the right or left (Figs. 23 
and 24, p. 98) ; while distributions showing kurtosis are more flat- 
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tened or peaked than the normal (Fig. 25, p. 100). In many studies 
the investigator wants to know whether his distributions are too 
atypical or deviant to be treated as normal, or whether their de- 
partures from normality are relatively mild and non-significant. 
Exact tests of the significance of various degrees of skewness or 
kurtosis will be found in more advanced text books.* The approxi- 
mate tests of significance given in this section are accurate enough 
for many purposes and are relatively easy to apply. 


1. The reliability of the percentile measure of skewness 


On page 99, the following formula was given for estimating the 
skewness of a frequency distribution in terms of its median and 
certain percentiles: 


sk = Poot Pro) — p, (20) 


| According to this formula, the skewness of the 50 Army Alpha scores 


in Table 1, page 5, is —2.50. The problem, then, is to determine 
Whether this degree of skewness represents a significant, departure 
from zero, the skewness of the normal curve. The SZ of the measure 
of skewness given above is 
5185 D 
os = N 69 


[SE of the measure of skewness given in formula (20)] 


in which D = (Poo — Pio). 
In the frequency distribu 
Po = 187, Р, = 152, and D 
5185 X 35 

Gsk = v50 


tion of the 50 Army Alpha scores, 
=35. From formula (64), therefore, 


= 2.57 


The deviation of our measure of skewness from 0 skewness is —2.50, 
and dividing —2.50 by 2.57 (CR = z/cs,) we get a CR of —.97. 

ote that the minus sign of 2.50 indicates simply the direction of 
SKewness. Our Sk, therefore, deviates —.97 озь from 0, the measure 
of skewness in the normal curve. From Table D we find that —.97 
falls well within the +1.96 limits, which determine the .05 level of 


Jobnson, Palmer O., Statistical Methods in Research (N 
» Inc., 1949), Chap. 7. 


ж ew York: Prentice- 


На 
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significance, Hence it is clear that —2.50 represents no real devia- 
tion of this frequency distribution from normality. 

The skewness of the distribution of 200 cancellation scores (p. 99) 
is .03 by formula (20). Since Ро = 128.5, Pip = 110.4, and D = 18.1, 
the SE of Sk is 


— 185X181 _ 
%200 nie 
Dividing .03 by .66, we get .046; and from Table D we find that this 
CR is far short of 1.96, the .05 level of significance. In fact, this 
distribution is almost perfeetly symmetrical as is shown in Figure 5, 
page 18. 


2. The reliability of the percentile measure of kurtosis 

| 

^. The formula below for measuring kurtosis in terms of Q and cer- 
tain percentiles in the distribution was given on page 100: 


mIRC NEN 21 
son (Poo — Pro) e 
The kurtosis of the frequency distribution of 50 Army Alpha scores 
(p. 00) by formula (21) is 237; and this Ku deviates —.026 from 
263, the Ku of the normal distribution (p. 100). The negative direc- 
tion of the deviation indicates that the distribution tends toward 
leptokurtosis. 
To estimate the significance of our Ku of —.026 from the Ku of 
the normal curve, we may calculate the SH of Ku by the following 
formula: 


28 
Oku = VN 
[SE of the measure of Ku given by formula (21)] 


in which N is, of course, the size of the sample. 


(65) 


For the 50 Army Alpha scores (p. 5), oru = <a or .039, and 
the CR (Ки/ок,) is —.026/.039 or —.67. This CR is less than 1.96; 
the .05 significance level, and there is no evidence—so far as our test 
is concerned—that this distribution is really more peaked than the 
normal, 

The kurtosis of the 200 cancellation scores (p. 13) is .223 by 


P 
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formula (21). This Ku deviates —.040 from .263, the Ки of the 
normal curve. Again the direction of the deviation is toward lepto- 
kurtosis. The SE of our Ku of .223 is .020 by formula (65); and 
Ku/og, is —.040/.020 or —2.00. Deviation from normal kurtosis is 
slightly greater than 1.96, the .05 significance level, but less than 
2.58, the .01 significance level. The narrow dispersion of this dis- 
tribution (Q = 4.04) and the fairly large N leads to a heavy con- 
centration of cases in the middle range; and these factors could well 
account for the strong tendency of this distribution to be more 
peaked than the normal. Leptokurtosis is not apparent in the curve 


itself (Fig. 5, p. 18). 


PROBLEMS 


1. The difference between two means is 3.60 and op = 3. Both samples 
are larger than 100. 
(a) Is the obtained difference significant at the .05 level? 
(b) What percent is the obtained difference of the difference necessary 
for significance at the .01 level? 
(c) Find the limits of the .99 confidence-interval for the true difference. 
ed in a private school to 8 boys 


2. A personality inventory is administer: 
and to 5 boys whose records are 


whose conduct records are exemplary, 
very poor, Data are given below. 
Group 1: 110 19 95 105 111 97 112 102 
« о. 115 112 109 112 117 


Is the difference between group means significant at the .05 level? 


at the .01 level? 
8. In which of the following experimental problems would it be more 

important to avoid Type I errors of inference than Type II errors in 

determining the significance of a difference? 

(a) Sex differences in reading rate and compre 


grade. 
(5) Effects of a new drug upon reaction time—especially when the 


drugs are potent and probably dangerous. 

(c) Comparison of two methods of learning a new skill. 

(d) Acceptance of a program which involves much time and money 
and rejection of a less expensive program. 

(e) Comparative efficiency of a speed-up and a no 
in a factory. 

4. In the first trial of a practice period, 25 twelve-year-olds have a mean 
score of 80.00 and a SD of 8.00 upon a digit-symbol learning test. On the 


hension, Їй the After 


rmal rate of work 
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tenth trial, the mean is 84.00 and the SD is 10.00. Тһе г between scores 

on the first and tenth trials is 40. Our hypothesis is that practice leads 

to gain. 

(a) Is the gain in score significant at the .05 level? at the .01 level? 
(p. 217) 

(b) What gain would be significant at the .01 level, other conditions 
remaining the same? 

5. Two groups of high-school pupils are matched for initial ability in & 
biology test. Group 1 is taught by the lecture method, and Group 2 
by the lecture-demonstration method. Data are as follows: 


Group 1 Group 2 
(control) (experimental) 


N 60 60 
Mean initial score on the biology test 42.30 42.50 
с of initial scores on the biology test 5.36 5.38 
Mean final score on the biology test 54.54 56.74 
o of final scores оп the biology test 6.34 7.25 
т (between final scores on the biology test) — .50 


(a) Is the difference between the final scores made by Groups 1 and 2 
upon the biology test significant at the .05 level? at the .01 level? 

(b) Determine the limits of the .95 confidence-interval for the true 
difference. қ 

(с) Is the difference in the variability of the final scores made by 
Groups 1 and 2 significant at the .05 level? 


6. Two groups of high-school students are matched for M and o upon 
a group intelligence test. There are fifty-eight subjects in Group A and 
seventy-two in Group B. The records of these two groups upon a bat- 
tery of “learning” tests are as follows: 


Group A Group B 
M 48.52 53.61 
іа TE] . - 1535 
N 58 72 


The correlation of the group intelligence test and the learning battery 
in the entire group from which A and B were drawn is .50. Is the differ- 
ence between Groups A and B significant at the .05 level? at the 01 
level? 


^. Calculate measures of skewness and kurtosis for the first two distribu- 
tions in Chapter 9, problem 1, page 40. Compute standard errors 0 
Sk and Ku by the formulas given on pages 241 and 242. Determine 
whether either of these distributions departs significantly from the not- 
mal form. 2 


Te 
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8. Ina school of 500 pupils, 52.3% аге girls; and іп a second school of 
300 pupils, 47.795 are girls. Is there a significant difference between 
the percentages of girls enrolled in the two schools? 


9. Given the following data for an item in Stanford-Binet: of 100 nine- 
year-olds, 72%, pass; of 100 ten-year-olds, 78% pass. Is the item more 
difficult for nine-year-olds than for ten-year-olds? 

10. (a) To the question ^Would you like to be an aviator?" 145 fifteen- 

| year-old boys in a high-school class of 205 answered “Yes” and 60 
answered “No.” То the question “Would you like to be an engi- 

1 neer?" 125 said “Yes” and 80 answered “No.” The data in the 
table below show the number who answered “Yes” to both ques- 

tions, “No” to both questions, “Yes” to one and “No” to the other. 

Is desire to be an aviator significantly stronger in this group than 


desire to be an engineer? 


Ques. 1 
| Хо Yes 
| Үез 25 100 | 125 
| Ques. 2 -- 
Хо 85 45 80 
: C/g 3 
| 60 145 905 dd 


h-grade children, 32 answered Item 23 сог- 
Item 26 correctly. From the table below, 
difference in the percentage of correct 


(b) Ina group of 64 sevent 
rectly and 36 answered 
| determine whether the 

answers is significant. 


| z Item 23 
| ES 
ve db ари ый 
7/ Пет 26 
Ж —| 2 6 | 28 


807 21 132 64 
ll. In random samples of 100 cases each from four groups, А, B, C, and D, 


the following results were obtained: 
A B C D 


Mean 101.00 104.00 93.00 86.00 
с 1000 1100 9.60 8.50 


What are the chances that, in general, the mean of 


(а) the B's is higher than the mean of the A’s. 
(b) the A's is higher than the mean of the С. 
(с) the C's is higher than the mean of the D's. 


; 
4 
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What are the chances that 

(a) any B will be better than the mean A. 

(b) any B will be better than the mean C. 

(c) any B will be better than the mean D. 

12. (a) The correlation between height and weight in a sample of 200 ten- 
year-old boys is .70; and the correlation between height and weight 
in a sample of 250 ten-year-old girls is .62. Is this difference sig- 
nificant? 

(b) Ina sample of 150 high-school freshmen the correlation of two edu- 
cational achievement tests is 65. If from past years the correla- 
tion has averaged .60, is the present group atypical? (Does .65 
differ significantly from .60?) 


ANSWERS 


- (а) No. CR=1.20 (5) 46.5% (c) —4.14 and 11.34 
t= 2.3; for 11 df, significant at .05, not at .01 level 
. a,candd 


жоо мю н 


- (a) Significant at .05, not at .01 level. Since # = 2.00 there is approxi- 
mately 1 chance in 50 that a plus difference (gain) of 4 would 
occur under the null hypothesis, 

(b) 4.98 


5. (a) t — 2.49; difference in M's significant at .05 but not at .01 level. 
(b) 43 to 3.97 


(c) No. t = 1.20 


6. Significant at .05 level (t= 2.57) and almost significant at .01 level. 
7. Distribution Sk/og, Ku/oxy 


1 —.23 .55 Deviation from normality not significant 
2 51 —.38 “ “ “ “ “ 

8. No. CR=1.24 

9. No. CR — 98 


10. (a) Significant at .05, not at .01 level (CR — 2.03) 
(b) Not significant (CR approximately 1.00) 
11. (a) 98 in 100 
(b) more than 99 in 100 
(c) more than 99 in 100 
(a) 61 in 100 
(b) 84 in 100 
(c) 95 in 100 


12. (a) No. CR — 147 (b) No. CR — 109 


йу 


E 
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TESTING EXPERIMENTAL HYPOTHESES 


+ 


The hypothesis proposed in a psychological experiment may take 
the form of a general theory or a specific inquiry. A specific hypothe- 
Sis is ordinarily to be preferred to a general proposal, as the moro 
definite and exact the query the greater the likelihood of a conclu- 
Sive answer. In the preceding chapter, the significance of an obtained 
difference was tested against a null hypothesis. In the present chap- 
ter, we shall consider further the nature of hypotheses and shall pre- 
Sent certain useful procedures and methods for answering the ques- 


tions raised by an experiment. 


I. The Null Hypothesis 


l. Advantages of the null hypothesis 


In Chapter 9 the difference between two statistics was tested 
Against a'null hypothesis, namely, that the true difference is zero. 
The null hypothesis is not confined to zero differences nor to the dif- 
ferences between statistics. Others forms of this hypothesis assert 
that the results found in an experiment do not differ significantly 
from results to be expected on a probability basis or stipulated in 
terms of some theory. A null hypothesis, as we have said on page 213, 
18 ordinarily more useful than other hypotheses because it is exact. 
Hypotheses other than the null can, to be sure, be stated exactly: we 
шау, for example, assert that a group which has received special 
training will be 5 points on the average ahead of an untrained (con- 
trol) group, But it is difficult to set up such precise expectations in 
Most experiments. And for this reason it is usually advisable to test 

247 
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against a null hypothesis, rather than some other, if this can be done. 

Tt is sometimes not fully understood that the rejection of a null 
hypothesis does not immediately force acceptance of a contrary 
view * (see p. 215). The extrasensory perception (ESP) ехрегі- 
ments Т offer a good illustration of what is meant by this statement. 
In a typical ESP experiment, a pack of 25 cards is used. There are 
5 different symbols on these cards, each symbol appearing on 5 cards. 
In guessing through the pack of 25, the probability of chance success 
with each card is 1/5. And the number of correct “calls” in a pack 
of 25 should be 5. If a subject calls the cards correctly much in 
excess of chance expectation (i.e. in excess of 5) the null (chance) 
hypothesis is rejected. But rejection of the chance hypothesis does 
not force acceptance of ESP as the cause of the extra-chance result. 
Before this claim сап be made, one must demonstrate in follow-up 
experiments that extra-chance results are obtained when all likely 
causes, such as runs of cards, visual and other cues, poor shuffling 
and recording, and the like have been eliminated. If under rigid con- 
trols calls in excess of chance are consistently obtained, we may 
reject the null (chance) hypothesis and accept ESP. But the ac- 
ceptance of a positive hypothesis—it should be noted—is the end 
result of a series of careful experiments. And moreover, it is a logical 
and not primarily a statistical conclusion. 


2. Testing experimentally observed results against the direct determina- 
tion of probable outcomes 


The null hypothesis is often useful when we wish to compare 
observed results with those to be expected by “chance.” Several 
examples will illustrate the methods to be employed. 


Example (1) Two tones, differing slightly in pitch, are to be 
compared in an experiment. The tones are presented in succession, 
the subject being instructed to report the second as higher or lower 
than the first. Presentation is in random order. In ten trials a sub- 


ject is right in his judgment seven times. Is this result significant, 
i.e., better than chance? 


Since the subject is either right or wrong in his judgment, and since 
judgments are separate and independent, we may test our result 


* Morgan, J. J. B., “Credence Given to One Hypothesis B f the Over- 
throw of Its Rivals,” Amer. Jour. Psychol., 1945, 08, GA es 


4-64. 
1 Rhine, J. B., et al, Extra-sensory Percepti ter Sixt New York: 
ні ета 1840). y Perception after Sixty Years (New 
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against the binomial expansion (p. 90). Ten judgments may be 
taken as analogous to ten coins; a right judgment corresponds to a 
head, say, a wrong judgment to a tail. The odds are even that any 
given judgment will be right; hence in ten trials (since p = 1/2) our 
subject should in general be right five times by chance alone. The 
question, then, is whether seven “rights” are significantly greater 
than the expected five. From page 90 we find that upon expanding 
(p+ q)?? the probability of 10 right judgments is 1/1024; of 9 right 
and one wrong, 10/1024; of 8 right and 2 wrong, 45/1024; and of 7 
right and 3 wrong, 120/1024. Adding these fractions we get 176/1024, 
or .172 as the probability of 7 or more right judgments by chance 
alone. The probability of just 7 rights is 120/1024, or approximately 
12. Neither of these results is significant at the .05 level of confi- 
dence (p. 186) and accordingly the null hypothesis must be retained. 
On the evidence there is no reason to believe that our subject’s 
judgments are really better than chance expectation. 

Note that to get 10 right is highly significant (the probability is 
approximately .001) ; to get 9 or 10 right is also significant (the prob- 
ability is 1/1024 + 10/1024, or approximately .01). To get 8 or 
more right is almost significant at the .05 level (the probability is 
055); but any number right less than 8 fails to reach our standard. 
The situation described in Example (1) occurs in a number of ex- 
Periments—whenever, for example, objects, weights, lights, test 
items, or other stimuli are to be compared, the odds being 50:50 that 
a given judgment is correct. 

Example (2) Ten photos, 5 of feeble-minded and 5 of normal 
children (of the same age and sex), are presented to a subject who 
claims he can identify the feeble-minded from their photographs. 
The subject is instructed to designate which five photographs are 
those of feeble-minded children. How many photos must our sub- 
ject identify correctly before the null hypothesis is disproved? 

Since there are 5 feeble-minded and 5 normal photos, the subject 
has a 50:50 chance of success with each photo and the method of 
Example ( 1) could be used. A better test,* however, is to determine 
the Probability that a particular set of 5 photos (namely, the right 
five) will be selected from all possible sets of 5 which may be drawn 
Tom the. 10 given photos. To find how many combinations of 5 
Photos can be drawn from a set of 10, we may use conveniently the 

9rmula for the combination of 10 things taken 5 at a time. This 


* Fisher RA ` > E 
1 , К. A, The Design of Experiments (London: Oliver and В 
935), Chapter 2, pp. 26-29 especially. syd, 
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! 
formula * is written С19% = = | = 252. Тһе symbol C1; is read 


“the combinations of ten things taken five at a time”; 10! (read 
“10 factorial”) is 10:9:8:7:6:5:4:8:2:1; and 5 ! is 5:4:3:2:1. 

It is possible, therefore, to draw 252 combinations of 5 from a set 
of 10, and accordingly there is one chance in 252 that a judge will 
select the 5 correct photos out of all possible sets of 5. If he does 
select the right 5, this result is obviously significant (the probability 
is approximately .004) and the null hypothesis must be rejected. 
Suppose that our judge's set of 5 photos contains 4 feeble-minded 
and one normal picture; or 3 feeble-minded and 2 normal pic- 
tures, Is either of these results significant? The probability of 4 right 
selections and one wrong selection by chance is V UT ie. the 
produet of the number of ways 4 rights can be selected from the 
5 feeble-minded pictures times the number of ways one wrong can be 
selected from the 5 normal pictures divided by the total number of 
combinations of 5. Caleulation shows this result to be 25/252 or 1/10 
(approximately) and hence not significant at the .05 level. The prob- 
ability of getting 3 right and 2 wrong is given by ах», namely, 
the product of the number of ways 3 pictures can be selected from 5 
(the 5 feeble-minded pictures) times the number of ways 2 pictures 
can be selected from the 5 normal pictures divided by the total num- 
ber of combinations of 5. This result is 100/252 or slightly greater 
than 1/3, and is clearly not significant. 

Our subject disproves the null hypothesis, then, only when all 5 
feeble-minded pictures are correctly chosen. The probabilities of 


various combinations of right and wrong choices are given below— 
they should be verified by the student: 


Probability of ай 58 = 1/252 
E A 4R = 25/252 


А ЗЕ = 100/252 
* “4 — 2R = 100/252 

*  ]R— 25/252 
F ““ QR— 1/252 


It may be noted that by increasing the number of pictures of 
feeble-minded and normal from 10 to 20, say, the sensitiveness of 
* The geneal formula for the combinations of n things taken r at a time 


і Сего 7) 


we 
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the experiment can be considerably enhanced. With 20 pictures it is’ 
not necessary to get all 10 feeble-minded photos right in order to 
achieve a significant result. In fact, 8 right is nearly significant at 
the .01 level as shown below. 


20! 


C", = yap] = 184756 
Combinations Frequency Prob. ratio (freq. 184,756) 
108 OW i 1000005 
98 ту 100 0005 
88 2W 2025 011 
7R 38W 14400 .078 
6R 4W 44100 .238 
5R 5W 63504 343 
4R бү 44100 238 
8R тү 14400 078 
28 sw 2025 011 
IR 9W 100 0005 
OR 10W 1 .000005 
184,756 


3. Testing experimentally observed results against probabilities calcu- 
lated from the normal curve 


When the number of observations or the number of trials is large, 
direct calculation of expectations by expanding the binomial 
(p + а)" becomes highly laborious. Since (p + 9)" yields a distribu- 
tion (р. 91) which is essentially normal when т is large, in many 
experiments the normal curve may be usefully employed to provide 
€xpected results under the null hypothesis. An example will make 
the method clear. 


Example (3) In answering a test of 100 true-false items, a sub- 
ject gets 60 right. Is it likely that the subject merely guessed? 


` As there are only two possible answers to each item, one of which 


18 right and the other wrong, the probability of a correct answer to 
апу item is 1/2, and our subject should by chance answer 1/2 of 100 
9r 50 items correctly. Letting p equal the probability of a right: 
answer, and q the probability of a wrong answer, we could, by ex- 
panding the binomial (p + 9) 19%, calculate the probability of various 
Combinations of rights and wrongs on the null hypothesis. When the 
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exponent of the binomial (here, number of items) is as large as 100, D 
however, the resulting distribution is very close to the normal prob- 
ability curve (p. 87) and may be so treated with little error. 


Р 


Figure 49 illustrates the solution of this problem. Тһе mean 
of the curve is set at 50. The SD of the probability distribution 
found’ by expanding (p-++q)" is o = pq; hence for (р 4- q), 
в = /100 X 1/2 X 1/2 or 5. A score of 60 covers the interval on the 
baseline from 59.5 up to 60.5. The lower limit of 60 is 1.90 removed 


59.5 — 50 
from ‘the mean aL 30); and from Table А we find that 


2. 87% of the area of a normal curve lies above 1.9с.* There are only 
three chances in 100 that a score of 60 (or more) would be made if; 
the null hypothesis, were true. А score of 60, therefore, is significant 
at the .05 level. We may reject the null hypothesis with some con- 
fidence and conclude that our subject could not have been simply 
guessing. 4 

, Note that the suona above could have been solved equally well 
in terms of percentages. We should expect our subject to get 50% 
of the items right by guessing. The SD of this percentage 18 | 


50% X50 j 
poh x or 5%. A score of 60% (lower limit 59.5%) is 95% 7 


* Note that only one end of the normal curve is used. See page 217. 


| 
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1.96 distant from the middle of the curve. We interpret this result in 
exactly the same way as that above. 


Example (4) A multiple-choice test of 60 items provides four 
possible responses to each item. How many items should a subject 
answer correctly before we may feel sure that he knows something 
about the test material? 


Since there are four responses to each item, only one of which is 
correct, the probability of a right answer by guessing is 1/4, of a 
wrong answer 3/4. The final score to be expected if a subject knows 
nothing whatever about the test and simply guesses is 1/4 X 60 or 15. 
Our task, therefore, is to determine how much better than 15 a sub- 
ject must score in order to demonstrate real knowledge of the 
material, 

This problem can be solved by the methods of Example (1). By 
expanding the binomial (р + 9)", for instance, in which p = 1/4, 
9 = 3/4, and n = 60, we can determine the probability of the occur- 
тепсе of any score from 0 to 60. The direct determination of prob- 
abilities from the binomial expansion is straightforward and exact 
but the calculation is tedious. Fortunately, therefore, a satisfactory 
approximation to the answer we want.can be obtained by using the 
normal distribution to determine probabilities, as in Example (3). 
Тһе mean of our "chance" distribution is 1/4 of 60 or 15; and the 
9 = Vnpq = \/60 X 1/4 X 3/4 or 3.35. From Table A we know that 
5% of the frequency in a normal distribution lie above 1.650. Мш- 
tiplying our obtained о (3.35) by 1.65, we get 5.53; and this value 
when added to 15 gives us 20.5 as the point above which lie 5% of 
the “chance” distribution of scores. А score of 21 (20.5 to 21.5), 
therefore, may be regarded as significant, and if a subject achieves 
Such a score we can be reasonably sure that he is not merely 
Buessing. . 

Fora higher level of assurance, we may take that score which 
Would occur by chance only once in 100 trials. From Table A, 1% 
of the frequency in the normal curve lies above 2.330. This point is 
7.81 (3.35 X 2.33) above 15 or at 22.8. A score of 23, therefore, or a 
higher score is very significant; only once in 100 trials would a sub- 
Ject achieve such a score by guessing. қ 

Use of the normal probability curve in the solution of problems like 
t 18 always involves a degree of approximation. When р differs con- 
Siderably from 1/2 and т is small, the distribution resulting from the 
expansion of (p+ а)" is skewed and is not therefore accurately de- 
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5 jk 
seribed by the normal curve. Under these circumstances one must 


resort to the direct determination of probabilities as in Example (1). 
When т is large, however, and р not far from 1/2, the normal dis- 
tribution can be safely used, as will be shown by the chi-square tests 
on page 261. 


11. The x? (Chi-square) Test and the Null Hypothesis 


Тһе chi-square test represents а useful method of comparing 
experimentally obtained results with those to be expected theoreti- 7 
cally on some hypothesis.* The formula for chi-square (y?) is stated 


ав follows: 
= 2 
=> [S (66) 


(chi-square formula for testing agreement between 
observed and expected results) 


in which 
fo = frequency of occurrence of observed or experimentally deter- 


mined facts; 
fe = expected frequency of occurrence on some hypothesis. 


The differences between observed and expected frequencies are 
squared and divided by the expected number in each case, and the 
sum of these quotients is 2. The more closely the observed results 
approximate to the expected, the smaller the chi-square and the 
closer the agreement between observed data and the hypothesis being 
tested. Contrariwise, the larger the chi-square the greater the prob- 
ability of a real divergence of experimentally observed from expected 
results. To evaluate chi-square, we enter Table E with the computed 
value of chi-square and the appropriate number of degrees of free- 
dom. The number of df = (r — 1) (c — 1) in which т is the number 
of rows and c the number of columns in which the data are tabulated. 
From Table E we read P, the probability that the obtained x? is sig- 


nifieant. Several illustrations of the chi-square test will be given in 
the sections following. 


19x. "s t 
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1. Testing the divergence of observed results from those expected on the 


hypothesis of equal probability (null hypothesis) 


Example (1) Forty-eight subjects are asked to express their 
attitude toward the proposition “Should the United States Join a 
Security Organization of Nations?” by marking F (favorable) 
I (indifferent) or U (unfavorable). Of the members in the group, 
24 marked F, 12 J, and 12 U. Do these results indicate a significant 
trend of opinion? 


The observed data (f,) are given in the first row of Table 26. In 
the second row is the distribution of answers to be expected on the 
null hypothesis (fs), if each answer is selected equally often. Below 
the table are entered the differences (f, — fe). Each of these differ- 
ences is squared and divided by its fe (64/16 + 16/16 + 16/16) to 
Blve y? = 6. 


TABLE 26 
Answers 
Favorable Indifferent Unfavorable 
Observed (fo) 24 12 12 48 
Expected (f.) 16 16 16 48 
8 4 4 
(fs зы 16 16 
(fo — fe) ^ 1 i 


(fo — fe)* 


2 2 
yr [4] =6 df=2 Р = .05 (Table E)" 


The degrees of freedom in the table may be calculated from the 
formula df = (т — 1) (c — 1) to be (8 — 1) (2 — 1) or 2. Or, the de- 
Erees of freedom may be found directly in the following way: Since 
We know the row totals to be 48, when two entries are made in a row 
the third is immediately fixed, is not “free.” When the first two 
entries in row 1 are 24 and 12, for example, the third entry must be 
12 to make up 48. Since we also know the sums of the columns; 
only оле entry in a column is free, the second being fixed as soon as 
the first is tabulated. There are, then, two degrees of freedom for 
tows and one degree of freedom for columns, and 2 X 1 = 2 degrees 
Of freedom for the table. 
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Entering Table E we find in row df = 2, a у? of almost 6 (actually, 
5.991) in the column headed .05. A P of .05 means that should we 
repeat this experiment, only once in 20 trials would a y? of 6 (or 
more). occur if the null hypothesis were true. Our result may be 
marked “significant at the .05 level,” therefore, on. ће grounds that 
divergence of observed from expected results is too unlikely of occur- 
rence to be accounted for solely by sampling fluctuations. We reject 
the “equal answer” hypothesis and conclude that our group really 
favors the proposition. In general, we may safely discard a null 
hypothesis whenever P is .05.or less. . 


Ezample (2) Тһе items in an attitude scale are answered by 

underlining one of the following phrases: Strongly approve, ap- 

- prove, indifferent, disapprove, strongly disapprove. The distribu- 

, tion of answers to an item marked by 100 subjects is shown in 

Table 27. Do these answers diverge significantly from the distribu- 
tion to be expected if there are no preferences in the group? 


TABLE 27: 
Strongly Moor Indiffer-  Disap- Strongly 
Ovi 
MN. Approve БЕЗІН; ent prove DT 
Observed (fo) 23 18 24 17 18 100 
Expected (f.)] 20 | 20. | 20° | 20 20 100 
(Fo — fe) 3 2 LIBE: 2 
(fo — fe)? 9 26 16 : 9 4 
(fo — fe)? 45 20 80 45 20 


%-210 454 Рез between .70 and 80 


pA 72! 
“Ол the null hypothesis of “equal probability” 20 subjects шау be 
expected to select each of the 5 possible answers. Squaring the 
(fo — fà; dividing Бу the expected result (fj), and summing, we 
obtain a 72 of 2.10. df = (5 — 1)(2 — 1) or 4. From Table E, read- 
ing across from row df = 4, we locate a 72 of 2.195 in column .70. 
This y^ is nearest to our calculated value of 2.10, which lies between 
the entries in columns .70 and .80. It is sufficiently accurate to de- 
scribe P as lying between .70 and .80 without interpolation. Since 
this much divergence from the null hypothesis, namely, 2.10, can be 
expected to occur upon repetition of the experiment in approximately 
75% of the trials, 3? is clearly not significant and we must retain the 


E- 


7 


A 
zl 


TESTING EXPERIMENTAL HYPOTHESES * 257 


null hypothesis. There is no conclusive evidence of either a strongly: 
favorable or unfavorable attitude toward this item. ) 


2. Testing the divergence of observed results from those expected on the 
hypothesis of a normal distribution 


Our hypothesis may assert that the frequencies of an event which 
we have observed really.follow the normal distribution instead of 
being equally probable. An example illustrates how this hypothesis 
may be tested by chi-square. 


Example (8) Forty-two salesmen have been classified into 3 
groups—very good, satisfactory, and poor—by a consensus of sales 
managers. Does this distribution of ratings differ significantly 
from that to be expected if selling ability is normally distributed ? 


TABLE 28 


Good Satisfactory Poor 


16 20 6 | 42 
67 286 67 | 42 


Observed (fo) 
Expected (fe) 


(fo — 12 93 86 7 

(fo— fe)? 86.49 73.96 49 

(fo—fe)? 1290 2.59 07 
fe 


y2=1556 df=2 Р is less than .01 


"The entries in row 1 give the number of men classified in each of 
the 3 categories. In row 2, the entries show how many of the 42 sales- 
men may be expected to fall in each category on the hypothesis of a 
normal distribution. These last entries were found by dividing the 
baseline of a normal curve (taken to extend over 66) into 3 equal 
Segments of 2с each. From Table A, the proportion of the normal 
distribution to be found in each of these segments is as follows: 


> - Proportion 
Between --3.00c and --1.006 . 16 
* 11,006 and —100c :: 68 
» *' 00e and —300g - 16 


1.00 
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These proportions of 42 have been calculated and are entered in 
Table 28. The у2 in the table is 15.56 and df = (3 — 1)(2—1) or 
2. From Table Е it is clear that this y? lies beyond the limits of the 
table, hence P is listed simply as less than .01. The discrepancy 
between observed and expected values is so great that the hypothesis 
of a normal distribution of selling ability must be rejected. Too 
many men have been described as good, and too few as satisfactory, 
to make for agreement with our hypothesis. 


3. The chi-square test when table entries are small 


When the entries in a table are fairly large, у? gives an estimate of 
divergence from hypothesis which is close to that obtained by other 
measures of probability. But y? is not stable when computed from a 
table in which any experimental frequency is less than 5. Moreover, 
when the table is 2 X 2 fold (when df = 1), у? is subject to considera- 
ble error unless a correction for continuity (called Yates’ correction) 
is made. Reasons for making this correction and its effect upon у? 
can best be seen by working through the examples following. 


Example (4) In Example (1), page 248, an observer gave seven 
correct judgments in ten trials. The probability of a right judg- 
ment was 1/2 in each instance, so that the expected number of 
correct judgments was five. Test our subject’s deviation from the 


null hypothesis by computing chi-square and compare the P with 
that found by direct calculation. 


TABLE 29 
Right Wrong 
Observed (fo) 3 10 
Expected (/.) 5 10 
(fo — f) 2 2 
Correction (— .5) 1.5 1.5 
(fo — fe)? 2.25 2.25 
(fo = fa)? 45 45 
te 
xt = D 
o = 


iP 5 ES (by interpolation in Table E) 
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Calculations in Table 29 follow those of previous tables except for 
the correction which consists in subtracting .5 from each (f,— f.) 
difference. In applying the y?-test we assume that adjacent fre- 
quencies are connected by a continuous and smooth curve (like the 
normal curve) and are not discrete numbers. In 2 Х 2 fold tables, 
especially when entries are small, the у? curve is not continuous. 
Hence, the deviation of 7 from 5 must be written as 1.5 (6.5 — 5) 
instead of 2 (7 — 5), as 6.5 is the lower limit of 7 in a continuous 
series. In like manner the deviation of 3 from 5 must be taken from 
the upper limit of 3, namely, 3.5 (see Fig. 49). Still another change 
in procedure must be made in order to have the probability obtained 
from y? agree with the direct determination of probability. P in the 
X? table gives the probability of 7 or more right answers and of 3 or 
less right answers, i.e., takes account of both ends of the probability 
curve (see p. 217). We must take 1/2 of P, therefore, if we want only 
the probability of 7 or more right answers. Note that the P/2 of .178 
is very close to the P of .172 got by the direct method on page 249. 
If we repeated our test we should expect a score of 7 or better about 
17 times in 100 trials. It is clear, therefore, that the obtained score 
is not significant and does not refute the null hypothesis. 

It should be noted that had we omitted the correction for continu- 
ity, chi-square would have been 1.60 and P/2 (by interpolation in 
Table E),.095. Failure to use the correction causes the probability 
of a given result to be greatly underestimated and the chances of its 
being called significant considerably increased. 

_ When the expected entries in a 2 X 2 fold table are the same (as 
in Tables 29, 30) the formula for chi-square may be written in a 
Somewhat shorter form as follows: 

к=. 2 

‚= BU ion 
е 
(short formula for y? in 2 X 2 fold tables when expected 
frequencies are equal) 


2 formula (67) to Table 29 we get a chi-square of 
1.5)? 
2.5)? = 90 


5 


Example (5) In Example (3), page 251, a subject achieved a 
Score of 60 right on a test of 100 true-false items. From the chi- 
Square test, determine whether this subject was merely guessing. 
Compare your result with that found on page 252 when the normal 
curve hypothesis was employed. 
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TABLE 30 
Right Wrong 
Observed (fc) | 60 40 | 100 
Expected (/) | 50 50 100 
(fo — fe) 10 10 
Correction (— .5) 9.5 9.5 
(fo — f) 90.25 9025 
(fo =e 181 181 
x? = 3.62 Р = 059 
af = 1 АР = .0295 ог .03 


Although the cell entries in Table 30 are large, use of the correc- 
tion for continuity will be found to yield a result in somewhat closer 
agreement with that found on page 252 than can be obtained without 
the correction. As shown in Figure 49, page 252, the probability of 8 
deviation of 60 or more from 50 is that part of the curve lying above 
59.5. In Table E, the P of .059 gives us the probability of scores of 
60 or more and of 40 or less. Hence we must take 1/2 of P (ie; 
.0295) to give us the probability of a score of 60 or more. Agreement 
between the probability given by the y?-test and by direct caleula- 
tion is very close. Note that when y? is calculated without the correc- 
tion, we get a P/2 of .024, a slight underestimation. In general, the 
correction for continuity has little effect when table entries are large, 
50 or more, say. But failure to use the correction even when numbers 
are large may lead to some underestimation of the probability; hence 
it is generally wise to use it. 


: Vern (6) In Example (4), page 253, given a multiple-choice 
est of 60 items (four possible answers to each item) we were re- 
quired to find what score a subject must achieve in order to dem- 
onstrate knowledge of the test material. By use of the normal prob- 
ability distribution, it was shown that a score of 21 is reasonably Í 


significant and a score of 23 highly signi 
4 | у significant. Can these results 
be verified by the chi-square test? 


In Table 31 an obtained score of 21 is tested against an expected 
score of 15. In the first line of the table the observed values (fo) 218 
21 right and 89 wrong; in the second line, the expected or “guess” 
values are 15 right and 45 wrong. Making the correction for CO?” 
tinuity we obtain a 7? of 2.69, a P of 10 and 1/2 P of .05. Only onc? 


f 
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TABLE 31 
R 
»| 21 
f| 15 


(fo — fe) 6 6 
Correction (— .5) 55 55 

(fo — f)? 30.25 30.25 

(fo — fe)? 2.02 .67 


У Je 
= 2.69 Р = Л0 
жі iP = 05 


in 20 trials would we expect a score of 21 or higher to occur if the 
Subject were merely guessing and had no knowledge of the test 
| material. This result checks that obtained on page 253. 

In Table 32 a score of 23 is tested against the expected score of 
15. Making the correction for continuity, we obtain a y? of 5.00 
which yields a P of .0275 and 1/2 P of .0138. Again this result closely 
Gi checks the answer obtained on page 253 by use of the normal prob- 
| ability curve, 


TABLE 32 
R W 
s| 23 | зт | 60 
f| 35 | 45. | 60 
» (f 519, NS 718 
Й Correction (- :5) 75 75 
4 ү (fo = fe)? 56.25 56.25 
70% 3Л5 125 
| | ж = 500 Р = 0275 
af = XP = `0138 ог.01 


4. Тһе chi-square test when table entries are in percentages 


The chi- -square test should not be used with percentage entries 
SES & correction for size of sample is made. This follows from 
| he fact that in dealing with probability the significance of an event 


= % 
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depends upon its actual frequency and is not shown by its percentage 
occurrence. For a penny to fall heads eight times in ten tosses is not 
as significant as for the penny to fall heads eighty times in 100 
tosses, although the percentage occurrence is the same in both cases. 
If we write the entries in Table 29 as percentages, we have 


R w 
| 70% | 30% | 100% 
f| 50% | 50% | 100% 
(f,—f) 20% 20% 
Correction* (— 5%) 15% 15% 
(fo — fy? 225% 225% 
2(22 by (67) 
х% = aza =9 
x =9х i5 — .90 (Table 29) 


It is clear that in order to bring у? to its proper value in terms of 
original numbers we must multiply the “percent” y? by 10/100 to 
give .90. А x? calculated from percentages must always be multiplied 
by N/100 (N = number of observations) in order to adjust it to the 
actual frequencies in the given sample. 


5. The chi-square test of independence in contingency tables 


We have seen that y? may be employed to test the agreement 
between observed results and those expected on some hypothesis. A 
further useful application of y? can be made when we wish to in- 
vestigate the relationship between traits or attributes which can 
be classified into two or more categories, The same persons, for ex- 
ample, may be classified as to hair color (light, brown, black, red) 
and as to eye color (blue, gray, brown), and the correspondence in 
these attributes noted. Or fathers and sons may be classified with 
respect to interests or temperament or achievement and the relation- 
ship of the attributes in the two groups studied. 

Table 33 is a contingency table, i.e., a double entry or two-way 
table in which the possession by a group of varying degrees of two 
characteristics is represented. In the tabulation in Table 33, 413 
persons have been classified as to “eyedness” and “handedness.” 


* From Table 29 it is clear that the correction of 


Е У =) 1 fes —5/N ог 
—.05; this is —5% when entries in the table are expr Sea 


essed as percents. 


v 


1 
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Eyedness, or eye dominance, is described as left-eyed, ambiocular, or 
right-eyed; handedness as left-handed, ambidextrous, or right- 
handed, Reading down the first column we find that of 118 left-eyed 
persons, 34 are left-handed, 27 ambidextrous and 57 right-handed. 
Across the first row we find 124 left-handed persons, of whom 34 are 
left-eyed, 62 ambiocular and 28 right-eyed. The other columns and 
rows are interpreted in the same way. 


TABLE 33 Comparison of eyedness and handedness in 413 persons * 


Left-Eyed Ambiocular Right-Eyed ^ Totals 


Left-handed (GSB) 124 
Ambidextrous 959 
Right-handed goro) 
Totals 118 195 100 413 
I. Calculation of independence values (fe): зү. 
118 x 124 195 X 124 _ X 124 _ 
TEES Sn 413 css ШЕ] 800 
118 Х 75 195 Х 75 . 100 X 75 _ igo 
эз 7 21.4 or a 35.4 aga 8. 
118 x 214 195 X 214 _ 100 X 214 рв" 
a = 6л D = 101.0 Fy = 518 
П. Calculation of x?: 
(— 14)%--354 = .055 (3.5)? +58.5 = 209 (— 2.0)? + 30 = .133 
(5.6): +21.4 =1.465 (— 7.4)? + 35.4 = 1.547 (1.8)?+ 18.2 = .178 
(= 4.1) +61.1 = .275 (4.0)? + 1010 = .158 (20)? + 51.8 = .001 


Х2-402 df=4 Plies between -30 and .50 


The hypothesis to be tested is the null hypothesis, namely, that 
handedness and eyedness are essentially unrelated or independent. 
In order to compute у? we must first calculate an “independence 
value” for each cell in the contingency table. Independence values 
are represented by figures in parentheses within the different cells; 
they give the number of people whom we should expect to find pos- 
Sessing the designated eyedness and handedness combinations in the 
absence of any real association. The method of calculating inde- 
Pendence values is shown in Table 33. To illustrate with the first 
entry, there are 118 left-eyed and 124 left-handed persons. If there 

* From Woo, T. L., Biometrika, 1936, 20А, рр. 79-118. 
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were no association between left-eyedness and left-handedness we 


118 X 124 
3 


should expect to find, by chance, or 35.4 individuals in our 


group who are left-eyed and left-handed. The reason for this may 
readily be seen. We know that 118/413 of the entire group is left- 
eyed. This proportion of left-eyed individuals should hold for any 
sub-group, if there is no dependence of eyedness on handedness- 
Hence, 118/413 or 28.5% of the 124 left-handed individuals, 1.6.) 
35.4, should also be left-eyed. Independence values for all cells are 
shown in Table 33. 

When the expected or independence values have been computed, 
we find the difference between the observed and expected values for 
each cell, square each difference and divide in each instance by the 
independence value. The sum of these quotients by formula (66) 
gives 72. In the present problem y? = 4.02 and df = (3 — 1) (8 — 1) 
or 4. From Table E we find that P lies between .30 and .50 and hence 
72 is not significant. The observed results are close to those to be 
expected on the hypothesis of independence and there is no evidence 


of any real association between eyedness and handedness within our | 


group. 


When the contingency table is 2 2 fold, y? may be calculated 
without first computing the four expected frequencies—the four 
independence values. Example (7) illustrates the method. 


Example (7) All of the sixth-grade children in a public-school 
system are given a standard achievement test in arithmetic. A 
sample of 40 boys, drawn at random from the sixth-grade popula- 
tion, showed 23 at or above the national norm in the test and 17 
below the national norm. A random sample of 50 sixth-grade girls 
showed 22 at or above the national norm and 28 below. Are the 


boys really better than the girls in arithemetic? Data are arranged 
in a fourfold table as follows. 


below аў or above 
norm norm 


(A+B) 
40 


(C+D) 
50 


(A+C) (B+D) N 
45 45 90 
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| ` In a fourfold table, chi-square is given by the following formula.* 
| tu N (AD — BC)? 

Y = AFB) (C+D) (4+ C) (B+D) 
(Chi-square in a fourfold contingency table) 
Substituting for A, B, C, D, in the formula, we have 
» — 90 (374 — 644)? 
40 X 50 X 45 X 45 


and for df = 1, P is almost .20. 7° is not significant and there is no 
evidence that the table entries really vary from expectation, i.e., that 
there is a true sex difference in arithmetic. 


(68) 


= 1.62 


6. The additive property of 7? 


When several y?'s have been computed from independent experi- 
ments (i.e., from tables based upon different samples), these may be 
summed to give a new chi-square with df — the sum of the separate 
df's. Тһе fact that chi-squares may be added to provide an over-all 
test of a hypothesis is important in many experimental studies. In 
Example (7) above we have seen that the boys did slightly better 
than the girls on the arithmetic achievement test, but the chi-square 
of 1.62 is not large enough to indicate a superiority of boys over 
girls. Suppose that three repetitions of this experiment are carried 
Out, in each instance groups of boys and girls [of about the same size 
as in Example (7)] being drawn independently and at random from 
the sixth grade and listed as scoring "at or above” or “below” the 
National norm, Suppose further that the three chi-squares from 
these tables are 2.71, 5.39 and .15, in each case the boys being some- 
What better than the girls. We can now combine these four results 
to get an over-all test of the significance of this sex difference in 
arithmetic, Adding the three y?'s to the 1.62 in Example (7) we have 
а total y? of 9.87 with 4 df's. From Table E this 7? is significant at 
the .05 level, and we may be reasonably sure that sixth-grade boys 
Are, on the average, better than sixth-grade girls in arithmetic. It 
Will be noted that our four experiments taken in aggregate yield a 
Significant result, although only one of the у?з (5.89) is itself 
Significant, Combining the data from several experiments will often 


-^ 


* “ H 
See page 367 for relation of x? to phi-coefficient. 
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yield a definitive result when the separate experiments taken alone 
provide only indications or suggestions of a true difference. 


PROBLEMS 


Two sharp clicking sounds are presented in succession, the second being 
always more intense or less intense than the first. Presentation is in 
random order. In eight trials an observer is right six times. Is this 
result significant? 

(a) Calculate P directly (p. 249). 


(b) Check P found in (a) by y?-test (p. 258). Compare P's found 
with and without correction for continuity. 


2. A multiple-choice test of fifty items provides five responses to each 
item. How many items must a subject answer correctly 
(a) to reach the .05 confidence level? 
(b) to reach the .01 confidence level? 


A multiple-choice test of thirty items provides three responses for each 
item. How many items must a subject answer correctly before the 
chances are only one in fifty that he is merely guessing? 


A pack of fifty-two playing cards contains four suits (diamonds, clubs, 
spades, and hearts). A subject “guesses” through the pack of cards, 
naming only suits, and is right eighteen times. г 

(a) Is this result better than “chance”? (Hint: In using the probabil- 


ity curve compute area to 17.5, lower limit of 18.0, rather than to 
18.0.) 


(b) Check your answer by the Х?-(ез% (р. 257). 


Twelve samples of handwriting, six from normal and six from insane 
adults, are presented to a graphologist who claims he can identify the 


writing of the insane. How many “insane” specimens must he recognize 
correctly in order to prove his contention? 


6. The following judgments were classified into six categories taken 10 
represent а continuum-pf opinion: 
Categories 
ШШ ШУ lit му у СІ Тоба! 
Judgments: 48. 61.82 9і 87 45 384 


(a) Test given distribution versus “equal probability” hypothesis. 
(6) Test given distribution versus normal distribution hypothesis. 


In 120 throws of a single die, the following distribution of faces w85 
obtained: 


8 
к. 

9 
“Ж” 

10. 
2 А 
# 

п. 
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Faces 
1 2 3 4 5 6 Total 
Observed 
frequencies: 30 25 18 10 22 15 120 


Do these results constitute a refutation of the “equal probability” 
(null) hypothesis? 


. The following table represents the number of boys and the number of 


girls who chose each of the five possible answers to an item in an atti- 
tude scale. 


Approve " А Strongly 

Strongly Approve Indifferent Disapprove Disspptówe Total 
Boys 95 80 10 25 10 100 
Girls 10 15 5 15 15 60 


Do these data indicate a significant sex difference in attitude toward 
this question? [Note: Test the “independence (null) hypothesis."] 


. The table below shows the number of normals and abnormals who 


chose each of the three possible answers to an item on a neurotic ques- 
tionnaire. 


Yes No ? Total 
Normals 14 66 10 90 
Abnormals 27 66 M 100 
4l 132 17 190 


Does this item differentiate between the two groups? Test the inde- 
pendence hypothesis. 

From the table below, determine whether Item 27 differentiates be- 
tween two groups of high and low general ability. 


Numbers of Two Groups Differing in General 
Ability Who Pass Item 27 in a Test 


Passed Failed Total 


High Ability 31 19. 50 
Low Ability 24 26 250 
55 45 100 


Five X?'s computed from fourfold tables in independent replications of 
an experiment are .50, 4.10, 1.20, 2.79 and 5.41. Does the aggregate of 
these tests yield a significant 77? 
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10. 
1. 


rs 
ANSWERS 

(a) P — .145; not significant 

(b) P — .145 when corrected; .085 uncorrected 

(a) 15 

(5) 17 

15 i 
. Probability of 18 or better is .08; not significant | 
. 5 or 6 (Probability of 5 or 6 = 37/924 = 04) | 
. (a) у2= 27; P less than .01 and hypothesis of “equal probability” м 


must be disearded. 
(b) у? = 356; P is less than .01, and the deviation from the normal 
hypothesis is significant. 


Yes. 4? = 12.90, df = 5, and P is between .02 and /05. 1 
No. 4? = 7.03, df = 4, and Р is between .20 and .10 

No. 4? = 4.14, df = 2, and P is between 20 and .10- 

No. 4? = 198, df = 1, and P lies between .20 and .10 

Yes. 4? = 14.00, df = 5, and P lies between .02 and 01. 


ANALYSIS OF VARIANCE ІМ DETERMINING 


THE SIGNIFICANCE OF DIFFERENCES 
BETWEEN MEANS 


+ 


The methods described under analysis of variance include (1) а 
variety of procedures called experimental designs, as well as (2) cer- 
tain statistical techniques devised for use with these procedures. The 
statistics used in analysis of variance are not new (as they are some- 
times thought to be) but are, in fact, adaptations of formulas and 
methods described earlier in this book. The experimental designs, on 
the other hand, are in several instances new at least to psychology. 
These systematic approaches often provide more efficient and exact 
tests of experimental hypotheses than do the conventional methods 
ordinarily employed. б 

This chapter will be concerned with the application of analysis of 
variance to the important and often-encountered problem of deter- 
Mining the significance of the difference between means. This topic 
has been treated by classical methods in Chapter 9, and the present 
chapter will give the student an opportunity to contrast the relative 
efficiency of the two approaches and to gain, as well, some notion of 
the advantages and disadvantages of each. ‘Treatment of other and 
More complex experimental designs through analysis of variance is 
beyond the scope of this book. After this introductory chapter, how-. 
ever, the interested student should be able to follow the more com- 


Prehensive treatments of analysis of variance in the references listed 
below.* 


s yo Edwards, A, L., Experimental Design in Psychological Research (New 


ME Rinehart and Co., 1950). к 
15 {ун emar, Q., Psychological Statistics (New York: John Wiley and Sons, 
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The plan of this chapter is to give, first, an elementary account of 
the principles of variance analysis. The problem of determining the 
significance of the difference between two means will then be con- 
sidered: (1) when the means are independent, i.e., when the sets of 
measures from which the M's are derived are uncorrelated, and (2) 
when M’s are not independent because of correlation among the 
different sets of measures or scores. 


І. How Variance Is Analyzed 
I. When pairs of scores are added to yield a composite score 


While the variability within a set of scores is ordinarily given by 
the standard deviation or o, variability may also be expressed by the 
“variance” ог 0°. A very considerable advantage of variances over 
SD's is the fact that variances are often additive and the sums of 
squares upon which variances are based always are. A simple ex- 
ample will illustrate this. Suppose that we add the two independent 
(uncorrelated) scores X and Y made by Subject A on tests X and Y 
to give the composite score Z (i.e., Z = X+ Y). Now if we add the 
Х and Ү scores for each person in our group, after expressing each 


score as a deviation from its own mean, we will have for any subject 
that 


2-т--у 
in which z = Z—M., x= X — M,, and y= Y — My. 


y Squaring both sides of this equation, and summing for all subjects , 
in the group, we find in general that 


22 = Уа? + Dy? 
The cross product term 25у * drops out as x and y are independent 


* mt 
The formula is r — TIT (р. 139). If т = 0, zy must also be zero. 


Lindquist, E. Е. Statisti i i э 7 au 
Houghton Mifflin Go. ДУ ical Analysis in Educational Research (Bosto 


Snedecor, G. W., Statistical M А x Col- 
lege Press, 1946). ical Methods (4th ed.; Ames, Iowa: Iowa State 


TA H., Methods of Statistical Analysis (New York: John Wiley and 


Fisher, R. A., Statistical Methods А don: 24 
Oliver and Boyd, 1941). ethods for Research Workers (8th ed.; Lon 


Fisher, В. A., The Design of Experiments (L : Oli d, 1935)- 
(The Fisher references will be difficult dor ne Soar tid veer?” 
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(uncorrelated) by hypothesis. Hence we find that the sum of the 
squares in x plus the sum of the squares in y equals the sum of the 
squares in z. Dividing by N, we have 


xs _ Ут X 


NO NN 
or 


Also 
с: = V o?, | o*, 
The equation in terms of variances is more convenient and more 
useful than is the equation in terms of SD's. Thus if we divide each | 
variance by o?, we have 


o? с? 
У 
possess 
gu gh 


which tells us what proportion of the variance of the composite Z is 
attributable to the variance of X and what proportion is attributable 
to the variance of Y. This division of total variability into its inde- 
pendent components cannot be readily done with SD's. 


2. When two sets of scores are combined into a single distribution 


Тһе breakdown of total variability into its contributing parts may 
be approached in another way. W hen two sets of scores, А and В, 
are thrown together or combined into a single distribution (see 
р. 57), the sum of the squares of all of the scores taken from the Мт 
of the single total distribution is related to the component distribu- 


tions A and B as follows: 
Ул2, = Ул + Er? 4- N ídÀ4 t Nyd?s 

Where Xa?, = SS * of deviations in total distribution Т from Мт 
Xa?, = 88 оѓ deviations in total distribution А from M4 
Ул2, = 88 of deviations in total distribution B from М» 


N, and N p are the numbers of scores in distributions А and B, 
respectively, d, and ds are the deviations of the means of A and B 
from the mean of T, i.e., (Ма — Mr)? = Фа, (Ms — Mr)? = ds. 

Тһе equation given above in terms of Sa?p is important in'the 
Present connection because it shows that the sum of the squares of 


= EX 
SS = sum of squares. 
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deviations around the mean of a single distribution made up of two 
component distributions can be broken down into two parts: (1) the 
SS around the A's of the two sets of scores, viz., M4 and М», and 
(2) the sum of squares (times the appropriate N’s) of the deviations 
of Ма and M; from Mz. An illustration will make the application of 
this result to variance analysis clearer.. 

Table 34 shows three sets of scores, 5 for group А, 10 for group 
B, and 15 for group T which is made up of А and B. The sums of 
scores, the means and SS around the M's have been calculated for 


each group. It may be noted that Mr = х-ке = 20; апа 


Ma XNa+MsXNo (зуу, 


- that, in general, Mr = 
1n gl Мт WT GENE 


TABLE 34 A and B are two distributions and T is a combination of the 
two 


Distribution А Distribution B Distribution Т (A+B) 


25 17 | 95 

15 90 15 

18 26 18 

92 18 29 

10 520 10 

25 17 

19 20 

26 26 

18 18 

21 20 

25 

19 

26 

18 

21 

Sum 90 210 300 
М 18 21 20 
Wi? 138 106 974 


Substituting the data from Table 34 in the sums equation above 
we find that 


274 = 138 + 106 + 5(18 — 20)? + 10(21 — 20)? 
or 274 = 138 + 106 + 20+ 10 


1 


( 
4 


| 


th 
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x 
Of the total SS (274), 244(138 + 106) is contributed by the variabil- 


ity within the two distributions A and В, and 30(20 -- 10) is con- 
tributed by the variability between the means of the two distribu- 
tions. This breakdown of total SS into the SS’s within component 
distributions and between the M’s of the combining distributions is 
fundamental to analysis of variance. The method whereby SS’s can 
be expressed as variances will be shown later. 


I. The Significance of the Difference between Means 
Derived from Independent or Uncorrelated 
Measures or Scores 


1. When there are more than two means to be compared 


The value of analysis of variance in testing experimental hypothe- 
Ses is most strikingly demonstrated in those problems in which the 
significance of the differences among several means is desired. An 
example will illustrate the procedures and will provide a basis for the 
discussion of certain theoretical points. 


Example (1) Assume that we wish to study the effects of eight 
different experimental conditions, designated A, B, C, D, E, F, G, 
H, upon performance on a sensory-motor task. From a total of 48 
subjects, 6 are assigned at random to each of 8 groups and the same 
test is administered to all. Do the mean scores achieved under the 
8 experimental conditions differ significantly? 


Records for the 8 groups are shown. in parallel columns in Table 35. 
Individual scores are listed under the 8 headings which designate the 
Conditions under which the test was given. Since “conditions” fur- 
nishes the category for the assignment of subjects, in the terminology 
of analysis of variance there is said to be one criterion of classifica- 
tion. The first step in our analysis is a breakdown of the total vari- 
ance (62) of the 48 scores into two parts: (1) the variance attributa- 
ble to the different conditions, or the variance among the 8 means, 
and (2) the variance arising from individual differences within the 
8 groups, The next step is to determine whether the group means 
differ significantly inter se in view of the variability within the sep- 
arate groups (individual differences). A detailed account of the 
Calculations required (see Table 35) is set forth in the steps on 
Pages 275-279, 
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TABLE 35 A hypothetical experiment in which 48 subjects are assigned 
at random to 8 groups of 6 subjects each. These groups are 
tested under 8 different experimental conditions, designated 
respectively A, B, C, D, E, F, G and H. 


Conditions 
AB 0 COD UH ДЫК = EE | 


Sums 432 468 492 510 390 456 366 372 Grand Sum: 3486 
M’s 72 78 82 85 65 76 61 62 General Mean = 72.63 


A. Calculation of Sums of Squares 
(3486)? 


Step | Correction term (C) — = 


= 253,171 

Step 2 Total Sum of Squares 
= (04?--72?--. . .-- 70? -- 682) — C 
— 262,364 — 253,171 — 9193 

Step 3 Sum of Squares among Means of A, B, C, D, E, F, G and H 
= (432)? + (468)? + (492)? + (510)? + (390)? + (456)? 


+ (366)? + (372)? 
6 -С 


— 253,171 = 3527 


_ 1540188 
6 


Step 4 Sum of Squares within Conditions A, B, C, D, E, F, G and H 
— Total 88 — Among Means SS 
— 9193 — 3527 — 5666 
B. Summary: Analysis of Variance 


Sums of Mean Square 


Source of Variation df Squares (Variance) sD 
Among the means of 

Conditions Hf 3527 503.9 
Within Conditions 40 5666 146 444 


Total 47 9193 
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T= 5039 _ 56 From Table F for 
Тб | т = 7 and ns = 40 
F at .05 = 2.25 
Е at .01 = 3.12 


C. Tests of Differences by Use of t 


For df = 40, ts; = 2.02 (Table D) 


ТЕТІН! 
-119./---- 
tov 2:71 у; NA +5 


= 11.9 Х.577 
= 6.87 

D.o = 2.02 X 6.87 = 13.9 

Do = 2.71 X 6.87 = 18.6 

Largest difference is between D and G = 24 


Smallest difference is between G and H = 1 


Distribution of Approximately 5 differences sig- 
Gf mean differences f nificant at .01 level 
22-94 2 Approximately 10 differences sig- 
19-21 3 Г at .05 level 
16-18 3 
13-15 4 
10-12 4 
7-9 3 
4-6 5 
1-3 4 
28 


вм 


a) o g oe oe eme ae tal 
Step I 
Correction term (C). When the SD is calculated from original 


Sx 
Measures or raw scores,* the formula eS — C? becomes 


SD? = = — M2. The correction (С) equals М directly іп this form 
of the equation, since C = AM — M and the AM (assumed mean) 


* А z A "ue 
AUD Page 54. In analysis of variance calculations it is usually more con- 
lent to work with original measures or raw scores. 
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N N 
we may multiply 
^ ч WU exi Xm 
this equation through by N to find that Sx? = XX? —-————- 


а қы М.Ж 
Table 35 the correction term ERI is 253,171. This correction 18 


applied to the sum of squares, EX?, 


2 SEX 2) 
here is zero. Replacing c? by = we have that мо = M 
1 


«t 
Now if the correction term M? is written (EX) 


Step 2 


Total sum of squares around 
xe = Exe — (2X)? 


ince 
the general mean. Sing 


525 res 
› We need only square and sum the original 800 


88,- хх. (EX?) (69) 
N 
(SS, around general mean using raw scores) 
Step 3 


5 Pun din omong the means obtained under the 8 condition? 
4 the sum of squar : TI ifferen? 
(SSars), we must first MS attributable to condition-differ 


Vs the sum of i.e., each: CO 
dition), add these sums ànd divide ЕУ ag umber 0 
, 


um : d 
Step 1, we then get the final S Subtracting the correction fou? 


M % aroun, 


example: 
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SSaps = 6[ (72 — 72.63)? + (78 — 72.63)? + (82 — 72.63)? 
++ (85 — 72.63)? + (65 — 72.63)? + (76 — 72.63)? 
4 (61 — 72.63)? + (62 — 72.63)?] = 3527 


When, as here, we are working with raw scores, the method of calcu- 
lation repeats Step 2, except that we divide the square of each column 
total by 6, the number of scores in each column, before subtracting C. 
The general formula is 


n): , (хх) xx 
SS (among means) = Х| gu. Bat „= - С (70) 


(SS among means when calculation is with raw scores) 


When the number of scores in the groups differ, the squares of the 
column sums will be divided by different ns before the correction is 
subtracted. (See page 282 for illustration.) 


Step 4 


Sum of squares within conditions (individual differences). The SS 
within columns or groups (SSw) always equals the SS; minus the 
БӨ. Subtracting 3527 from 9193, we have 5666. This SS,, may also 
be calculated directly from the data (see р. 296). 


Step 5 


Calculation of the variances from each SS and analysis of the total 
Variance into its components is shown in the B part of Table 35. 
Each SS becomes a variance when divided by the degrees of free- 
dom (df) allotted to it (p. 193). There are 48 scores in all in Table 35, 
and hence there are (М — 1) or 47 df in all. These 47 df are allocated 
in the following way. The df for “among the means of conditions” 
are (8 — 1) or 7, less by one than the number of conditions. The df 
within groups or within conditions are (47 — 7) or 40. This last 
df may also be found directly: since there are (6 — 1) or 5 df for 
€àch condition (N = біп each group), 5 X 8 (number of conditions) 
Bives 40 df for within groups. The variance among M's of groups 
2 3527/7 or 503.9; and the variance within groups is 5666/40 or 

41.6. 


-l 


грит > 
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If N = number of scores in all and k = number of categories or 
groups, we have for the general case that 


df for total SS = (N—-1) 

df for within groups S8 = (N—k) 

df for among means of groups SS = ( k— 1) 
Also: (1—1) = (А —k) 4- (k — 1) 


Step 6 


In the present problem the null hypothesis asserts that the 8 sets 
of scores are in reality random samples drawn from the same nor- 
mally distributed population, and that the means of conditions А, В, 
C, D, E, F, G and H will differ only through fluctuations of sampling. 
To test this hypothesis we divide the “among means" variance by the 
“within groups" variance and compare the resulting variance ratio, 
called F, with the F-values in Table F (see p. 429). The F jn our 
problem is 3.56 and the df are 7 for the numerator (т) and 40 for the 
denominator (пз). Entering Table Е, we read from column т (т) 
and row 40 (nə) that an F of 2.25 is significant at the .05 level and 
an F of 3.12 is significant at the .01 level. Only the .05 and .01 points 
are giyen in the table. These entries mean that, for the given df's, 
variance ratios or F’s of 2.25 and 3.12 can be expected once іп 20 and 
once in 100 trials, respectively, when the null hypothesis is irue. 
Since our F is larger than the .01 level, it would occur less than once 
in 100 trials by chance. We reject the null hypothesis, therefore, and 
conclude that the means of our 8 groups do in fact differ. 

F furnishes a comprehensive or over-all test of the significance of 
the differences among means. A significant Ё, does not tell us which 
means differ significantly, but that at least one is reliably different 
from some others. If F is not significant there is no reason for further 
testing, as none of the mean differences will be significant (see р. 281). 


But if F is significant, we may proceed to test the separate differences 
by the t-test (p. 427) as shown in Table 35 C. А 


Ѕер 7 


_ The best estimate which we can make of ће uncontrolled variabil- 
ity arising from individual differences is given by the SD of 119 


| 
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computed from the “within groups" variance given in Table 35 B. 
This SD is based upon all of our data and is a measure of subject 
variability after the systematic effects arising from differences in 
column means have been allowed for. In testing mean differences by 
the t-test, therefore (Table 35 C), the SD of 11.9 is used throughout 
instead of the SD's caleulated from the separate columns, A, B, C, D, 


w 


E, F, G and H. The standard error of any mean (SEx) is a 


7 
or 11.9/\/6 = 4.86. And the SE of the difference (D) between any 
two means is SE, = \/4.86" + 4.86? or 6.87. А general formula for 
calculating SE y directly is 


1 1 
SE» = SD 7, (71) 


(standard error of the difference between any two means in 
analysis of variance) 


where SD,, is the within-groups SD, and n; and л» are the sizes of the 
samples or groups being compared. 

The means of the 8 groups in Table 35 range from 61 to 85, and 
the mean differences from 24 to 1. То determine the significance of 
the difference between any two selected means we must compute a 
t-ratio by dividing the given mean difference by its SE». The result- 
ing t is then compared with the ¢ in Table D for 40 df, viz., the num- 
ber of df upon which our SD, is based. А more summary approach 
than this is to compute that difference among means which for 40 df 
Will be significant at the .05 or the .01 level and check our dif- 
ferences against these standards.. This is done in Table 35 C. We 
know from Table D that for 40 df, a t of 2.02 is significant at the .05 
level; and a t of 2.71 is significant at the .01 level. Since t = mean 
difference/SEp, we may substitute 2.02 for ¢ in this equation and 
6.87 for SE; to find that a difference of 13.9 is significant at the .05 
level, Using the same procedure, we substitute 2.71 for t in the equa- 
tion to find that a difference of 18.6 is significant at the .01 level. 


Eight means will yield GU or 28 differences. From the dis- 


tribution of these 28 differences (Table 35 C) it is clear that approx- 
imately 5 differences are significant at the .01 level (1.е., are 18.6 or 
More) ; and approximately 10 at the .05 level (1.е., are 13.9 or more). 
The largest, difference is 24 and the smallest is 1. 


280 • STATISTICS IN PSYCHOLOGY AND EDUCATION 


Discussion * 


А few additional comments may clarify the calculations in Table 
35. 

(1) First, it must be remembered that we are testing the null 
hypothesis—the hypothesis that there are no true differences among 
our 8 condition-means. Stated differently, we are testing the hy- 
pothesis that our 8 groups are in reality random samples drawn from 
the same normally distributed population. The F-test refutes the 
null hypothesis by demonstrating differences among our means which 
cannot be explained by chance: ie., differences larger than those 
which would occur by sampling accidents once in 100 trials if the 
null hypothesis were true. 

(2) Тһе 47 df (48 — 1) in the table are broken down into 7 df 
which are allotted to the 8 condition-means and 40 df which are 
allotted to individual differences (variations within groups or col- 
umns). Variances are calculated by dividing each SS by its own df. 

(3) In problems like that of Table 35 (where there is only one 
eriterion of classification), all 3 variances (total, among means and 
within groups) are in effect estimates of the variance in the popula- 
tion of scores from which our 8 samples are drawn. Only two of these 
variances are independent: the variance among condition-means and 
the variance within groups, since Vr is composed of these two. These 
two independent estimates of population variance are used in com- 
puting the variance ratio and making the F-test. When samples are 
strictly random these two variances are equal and F is 1.00. More- 
over, when F is 1.00, the variance among group means is no greater 
than the variance within groups; or, put differently, group-means 
differ no more than do the individuals within the groups. The extent 
to which F is greater than 1.00 becomes, then, a measure of the sig- 
nificance of the differences among group means. The larger the F the 
greater the probability that group mean differences are greater than 
individual variation—sometimes called “experimental error.” 

(4) According to the traditional method of treating a problem like 
that of Table 35, 8 SD’s would first be computed, one around each of 
the 8 column means. From these SD's, SE's of the means and SE's of 
the differences between pairs of means would be caleulated. A t-test 
would then be made of the differences between any two given means 
and the significance of this difference determined from Table D. 

Analysis of variance is an improvement over this procedure jn sev- 


* See Garrett, Н. E., and Zubin, J., “The Analysis of Variance in Psychologi- 
cal Research,” Psychol. Bull., 1943, 40, 233-267. 
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eral respects. In Table 35 we first compute an F-ratio which tells us 
whether any mean differences are significant. If F is significant, we 
may then compute a single SE». This SE» is derived from the 50, 
calculated from the 8 groups after systematic mean-differences have 
been removed. Moreover, this within-groups SD—based as it is upon 
all 48 scores and with 40 df—furnishes a better (і.е., more reliable) 
measure of uncontrolled (or experimental) variation in the table 
than could be obtained from SD’s based upon only 8 scores and 7 df. 
Pooling of sums to obtain the within-groups SD is permissible, since 
the deviations in each group һауе been taken from their own mean. 

(5) If the F-test refutes the null hypothesis we may use the t-test 
to evaluate mean-differences. If the F-test does not refute the null 
hypothesis there is no justification for further testing, as differences 
between pairs of means will not differ significantly unless there are a 
number of them—in which case one or two might by chance equal or 
approach significance.* 


2. When there are only two means to be compared 


In order to provide a further comparison of analysis of variance 
With the methods of Chapter 9, example (4), page 223, is solved in 
Table 36. This second example will show that when only two means 
are to be compared, the F-test reduces to the t-test. 


TABLE 36 Solution of Example (4), page 223, through methods of analy- 


sis of variance 


Scores: 
Class 1 (Уі = 6) Class 2 (М = 10) 
28 20 
35 16 
32 25 
24 34 
26 20 
35 28 
6/180 5 81 
M,= 30 24 
27 
15 
10|240 
M= 24 


IN Pun 100 strictly random differences, 5 will be significant at the 05 level; that 

] %% will exceed 1.960 at each end of the curve of differences (р. 188). 
білсе іп 28 differences (Table 35 С) 1 or 2 might be significant at the 
79 level (28 x 05 = 1.40) if differences are randomly distributed around лего. 
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TABLE 36—{Continued) 


A. Sums of Squares 

1. Correction: (420)2/16 = 11025 

2. 88, —9282-L352-L-..-L 152 —C | 
— 11622 — 11025 — 597 


(180)? , (одо)? | 
3. SS, ar +s = с 


= 11160 — 11025 = 135 
4. SS, = 597 — 135 = 462 


B. Analysis of Variance x 
Source df SS MS(V) 

Between means 1 135 135 

Within classes 14 462 33 
Total 15 597 

24 тю From Table Р 
33 F at .05 level = 4.60 

t =F = 2.02 F at .01 level = 8.86 


Step | 


А The sum of all of the 16 scores is 180 + 240 or 420; and the correc- 
tion (C) is, accordingly, (420)2/16 or 11025. See page 275. 


Step 2 
When each score has been squared and the correction subtracted 


from the total, the SS around the general mean is 597 by formula 
(69), page 276. 


Зер 3 
The sum of squares between means (135) is found by squaring 


the sum of each column, dividing the first by 6 (nı) and the second 
by 10 (n2) and subtracting С. 


Step 4 


The SS within groups is the difference between the SS, and 
SSwetween мв. Thus SS, = 597 — 135 = 462. 
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Step 5 


The analysis of variance is shown in Table 36 B. SS; is divided 
into SS between means of groups and SS within groups. Since there 
аге 16 scores in all, there аге (№ — 1) or 15 df for “total.” The 585, 
is allotted (k — 1) or 1 df (Е = 2). The remaining 14 df are assigned 
to within groups and may be found either by subtracting 1 from 15 
or by adding the 5 df in Class 1 to the 9 df in Class 2. Mean squares 
or variances are obtained by dividing each SS by its appropriate df. 


Step 6 


The variance ratio or F is 135/33 or 4.09. The df for between 
means is 1 (nı) and the df for within groups is 14 (тә). Entering 
Table F with these n’s we read in column 1 and row 14 that the .05 
level is 4.60 and the .01 level is 8.86. Our F of 4.09 does not quite 
reach the .05 level so that our mean difference of 6 points must be 
regarded as not significant. The difference between the two means 
(30 — 24) is not large enough, therefore, to be convincing; or, stated 
more mathematically, a difference of 6 can be expected to occur too 
frequently to render the null hypothesis untenable. 

When there are only two means to be compared as here, F = t? or 
t = УТ and the two tests (F and 4) give exactly the same result. In 
Table 36 B, for instance, F = \/4.09 or 2.02 which is the ¢ previously 
found in example (4) on page 223. From Table D we have found 
(p. 224) that for 14 df the .05 level of significance for this t is 2.14. 
Our t of 2.02 does not quite reach this level and hence (like F) is not 
significant, If we interpolate between the .05 point of 2.14 and the 
110 point of 1.76 in Table D, our ¢ of 2.02 is found to fall approxi- 
mately at .07. In 100 repetitions of this experiment, therefore, we 
сап expect a mean difference of 6 or more to occur about 7 times— 
too frequently to be significant. 


3. Example (5), page 225, solved by analysis of variance 


In problems requiring the comparison of two group means either 
F or t may be employed. From the standpoint of calculation, F is 
perhaps somewhat easier to apply. In example (5), page 225, it is 
easier to calculate t because raw scores are not given. But F may 
be calculated if desired in the following way. The general mean for 
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the two groups is (40.39 X 31 + 35.81 X 42) divided by 73, or 37.75: 
it is the weighted mean obtained from the two group means. 
The SS between the means of the groups of boys and girls is 
31 (40.39 — 37.75) 2 + 42 (35.81 — 37.75)? or 374.13; namely, the de- 
viation of each group mean from the general mean weighted in each 
case by the N of the group. " 
То get the SS within groups we simply square each SD and multi- 


22" (p. 189). In 


ply by (М- 1), remembering that SD? = 


example (5) we find that (8.69)? X 30 = 2265.48; and (8.33)? X 41 
= 2844.95. The sum of these two is 5110.43, the SS within groups. 
The complete analysis of variance and F test are shown in Table 37; 
Е = 5.20 and t = V/F or 2.28, checking the result given on page 225. 
Our F of 5.20 exceeds the .05 level of 3.98 but does not reach the .01 
level of 7.01. As before, Ғ and ¢ give identical results. 


TABLE 37 Solution of example (5), page 225, by analysis of variance 


A. Sums of Squares and General Mean 
(40.39 < 31 + 35.81 X 42) 
73 


1. General mean = 37.75 


2. SS between means: 
31(40.39 — 37.75)? + 42(35.81 — 37.75)? = 374.18 
3. SS within groups: 
30(8.69)2 + 41(8.33)? = 5110.43 
B. Analysis of Variance 
Sums of Mean Square 


Source of Variation df Squares (Variance) 
Between means 1 374.13 374.1 
Within groups 71 5110.43 719 
F = 374.1/71.9 = 520 From Table F 
t = VE = V52 = 228 df =1/71 
F at 05 = 3.98 
Fat 01 = 7.01 


SS жез БА ee ee ee 


4. 
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111. The Significance of the Difference between Means 
Obtained from Correlated Groups 


I: When the same group is measured more than once 
(single group method) 


Means are correlated when the two sets of scores achieved by the 
Eroup from which the means were derived are correlated. When 
& test is given and then repeated, analysis of variance may be 
used to determine whether the mean change is significant. The ex- 


TABLE 38 Solution of example (7), page 227, by analysis of variance 


А. Sums of Squares 

1. Correction — (1240)2/24 — 
2. Total Sum of Squares — 68952 — 64066.67 — 4886.38 
3. Between trials sum of squares: 


1537600 3 
DENS 64066.67 


(ray Qr — 64066.67 = 384.00 
4. Among subjects' sum of squares: 
68391 — 64066.67 = 4324.33 
5. Interaction sum of squares = 4885.33 — (384.00 + 4324.33) 


= 177 
В. Analysis of Variance 
Sums of Mean Square 
Source of Variation df Squares (Variance) SD 
Between trials il 384.00 384.00 
Among subjects 11 4324.33 393.12 
Interaetion 11 177.00 16.09 4.01 
Total 23 4885.33 
РЫ тсе = — 9386 t = \/23.86 = 488 
s From Table F 
393.12 4 5 T 
F subjects = 76.09 . 24.43 Trials Subjects 
df = 1/11 df = 11/11 
F at 05 = 4.84 2.82 


F at 01 = 9.65 446 
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perimental design here is essentially the same as that of the Single 
Group Method of Chapter 9, page 225. Hence example (7), page 227, 
is used in Table 38 to illustrate the methods of analysis of variance 
and to provide a comparison with the difference-method of page 227. 

The procedures for the analysis of variance in example (7) differ 
in at least two ways from the methods of Section П. First, since 
there is the possibility of correlation between the scores achieved by 
the 12 subjects on the first and fifth trials, the two sets of scores 
should not at the outset be treated as independent (random) sam- 
ples. Secondly, classification is now in terms of two criteria: (а) 
trials and (b) subjects. Because of these two criteria, the total SS 
must be broken down into three parts: (a) SS attributable to trials; 
(b) SS attributable to subjects; and (c) a residual SS usually called 
“interaction.” Steps in the calculation of these three variances, 
shown in Table 38 A, may be summarized as follows. 


Step | 

Correction (C). As in Section II, C DELL In example (7) C is 
(1240)2/24 or 64066.67, 
Step 2 


Total SS around general mean. Again the calculation repeats 
the procedure of Section II. 


S8; = (50? 4- 422 4-..... + 722 + 50?) — 64066.67 
— 68952 — 64066.67 — 4885.33 
Step 3 


SS between the means of trials. "There are two trials of 12 scores 
each. "Therefore, 


2 
SStriats = (m? (608) — 64066.67 


= 64450.67 — 64066.67 = 384.0 
Step 4 
SS among the means of subjects. A second “between means" SS 


is required to take care of the second criterion of classification. 
There are 12 subjects and each has two trials. Hence, 
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2 2 2 2 
112? + 822 4-..... + 134? + 88 — 64066.67 


SScubjects = 
= 68391.00 — 64066.67 = 4324.33 


Step 5 


Interaction SS. The residual variation or interaction is whatever 
is left when the systematic effects of trial differences and subject 
differences have been removed from the total SS. Interaction meas- 
ures the tendency for subject performance to vary along with trials: 
it measures the factors attributable neither to subjects nor trials 
acting alone, but rather to both acting together. Interaction is 
obtained most simply * by subtracting trials SS plus subjects SS from 
total SS. Thus 


Interaction SS = SS; — (SSgunjects + SSertas) 
= 4885.33 — (384 + 4324.33) 


=177 


Step 6 


As before, SS's become variances when divided by their appro- 
priate df. Since there are 24 trials in all we have (24 — 1) or 23 df 
for the total SS. Two trials receive 1 df, and 12 subjects, 11. Тһе 
remaining 11 df are assigned to interaction. The rule is that the df 
for interaction is the product of the df for the two interacting vari- 
ables, here 111. In general if № = total number of scores, 
T — rows and k — columns, we have 

df for total SS = (N —1) 
df for column 88 (trials) = ( k —1) 
df for row SS (subjects) = ( r—1) 
df for interaction SS = ( k—1) (r— 1) 

The three measures of variance appear in Table 38. Note that we 
may now calculate two F’s, one for trial differences and опе for sub- 
Ject differences. In both cases the interaction variance is placed in 
the denominator of the variance ratio, since it is our best estimate of 
Tesidual variance (or experimental error) after the systematic influ- 
ences of trials and subjects have been removed. The F for trials із 

* Interaction may be calculated directly from the data, 
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23.86 and is much larger than the 9.65 we find in Table F for the .01 
point when n, = 1 and n» = 11. This means that the null hypothesis 
with respect to trials is untenable and must be abandoned. Тһе 
evidence is strong that real improvement took place from trial 1 to 
trial 5. 

Ordinarily in most two-criteria experiments we are concerned 
primarily with one criterion, as here. It is possible, however (and 
sometimes desirable), to test the second criterion—viz., differences 
among subjects. The F for subjects is 24.43 and again is far larger 
than the .01 point of 4.46 in Table Е for n, = 11 and n; = 11. It is 
obvious that some subjects were consistently better than others 
without regard to trial. 

Since there are two trials, we have two trial means. Hence, if we 
compute a ¢ from the F for trials, it should be equal to that found by 
the difference-method. The F of 23.86 yields a t of \/23.86 or 4.88 
which checks the t of 4.88 on page 227. 

Computations needed for the difference-method of example (7), 
page 227, are somewhat shorter than are those for analysis of vari- 
ance, and the difference-method would probably be preferred if one 
wished to determine only the significance of the difference between 
the two trial means. If, however, the significance of the differences 
in the second criterion (differences among subject means) is wanted, 
analysis of variance is more useful. Moreover, through a further 
analysis of variance we can determine whether individual differences 
(differences among subjects) are significantly greater than practice 
differences (differences between trials). Thus if we divide the 
Уашев by the Virus, the resulting F is 393.12/384 or 1.02. For an 
nı = 11 and л» = 1, the .05 point is 243. Hence, іп the present experi- 
ment, at least, we may feel quite sure that individual differences 
were no greater than practice differences. Since the reverse is usually 
true, the implication to be drawn is that practice in the present 


experiment must have been quite drastic: a conclusion borne out by 
the F-test for trials. 


2. When in evaluating the differences between two or more groups on а 


test we wish to allow for initial differences among the groups on 
the same or different measures 


In many experimental situations, especially in the fields of 
memory and learning, we wish to compare groups that are initially 
unlike, either in the variable under study or some presumably related 
variable. In Chapter 9, two methods were given for equating group? 
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initially—having them “start from scratch.” In the first method, 
experimental and control groups were made equivalent initially by 
person-to-person matching; and in the second method, groups were 
matched initially for mean and c in one or more related variables. 
Neither of these methods is entirely satisfactory and neither is 
always easy to apply. Equivalent groups often necessitate a sharp 
reduction in size of N (and also in variability) when the matching of 
scores is difficult to accomplish. Furthermore, in matched: groups it 
is often difficult to get the correlation between the matching variable 
and the experimental variable in the population from which our 
samples were drawn (p. 231). | 

Analysis of covariance represents an extension of analysis of vari- 
ance to allow for the correlation between initial and final scores. 
Covariance analysis is especially useful to experimental psychol- 
ogists when for various reasons it is impossible or quite difficult to 
equate control and experimental groups at the start: a situation 
which often obtains in actual experiments. Through covariance one 
is able to effect adjustments in final or terminal scores which will 
allow for differences in some initial variable. (For many other uses 
of covariance the reader should consult the references on page 268.) 

Table 39 presents a numerically simple illustration of the applica- 
tion of analysis of covariance. The data in Example (1) are artificial 


Example (1) Suppose that fifteen children have been given 
one trial (X) of a test. Five are then assigned at random to each 
of three groups, A, B and C. After two weeks, say, group A is 
praised lavishly, group B scolded severely and the test repeated 
(Y). At the same time, a second trial (Y ) is also given to group C, 
the control group, without comment. 


TABLE 39 To illustrate covariance analysis 


Original Data [Example (1)] 
Group À (praised) Group B (scolded) Group C (control) 


Xi Yi XY. Ха Үз Ха» Ха Үз XY: 
15 30 450 25 28 700 5 10 50 
10 20 200 10 12 120 10 15 150 
20 25 500 15 20 300 20 20 400 
5 15 75 15 10 150 5 10 50 
10 20 200 10 10 100 10 10 100 
Sums 60 110 1425 75 80 1370 50 65 750 
Мз 12 22 15 16 10 13 


For all 3 groups: EX = 185 ХҮ = 255 AT = 
Бхз 80005 sys 500 О peaks 


f 
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and are purposely meager so that the procedure will not be swamped 
by the numerical calculations. 
Step |. Correction terms: 
С.- (185)2/15 = 2282 
Cy = (255)?/15 = 4335 


_ 185 X 255 
15 


б = 3145 


Step 2. Total SS 


For « = 2775 — 2282 = 493 
у = 5003 — 4335 = 668 
ту = 3545 — 3145 = 400 
Step 3. Among Group Means 85 
2 2 2 
For z _ 60 + 75° + 50 


— 2282 = 63 
5 
2 2 2 
y = HES _ 4395 = 210 
ay Pa! 110 + 75 X 80 + 50 X 65 —3145 = 25 


5 
Step 4. Within Groups 85 
For z =493— 63 = 430 
ғ у = 668— 210 = 458 
ту = 400 — 25 = 375 


Step 5. Analysis of Variance of X and Y scores, taken separately 


Source of Variation df 88, SS, MS, (Vz) MS, (Vy) 
Among Means D 63 210 31.5 105 
Within Groups 12 430 458 35.8 38.2 

Total 14 493 668 
5 
ре п = 88 From Table Е 
358 df 2/12 
Р,= 105 — 255 Е at 05 level = 3.88 


382 Е at 01 level = 6.93 
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_ Neither F is significant. Mean differences on final trial approach 


significance. 
Р. = 88 shows that the experimenter was quite successful in 
getting random samples in Groups A, B, С. А 


Step 6. Computation of Adjusted SS for У: 1е., 86, 


Total 8S = 668 — 400)" — sas 
493 


> 
Within SS = 458 — 27)" = 131 
3 430 
' Among М» SS=343— 131-212 
Analysis of Covariance 
Source of Variation df SS, SS, SS.y 58,, М85,.(У,.) SD,, 
| Among Means 2 63 210 25 212 106 
Within Groups 11% 430 458 375 181 12 3.46 
Total 13 493 668 400 343 
106 From Table F 
a yam TSS df 2/11 
^ Е at 05 level = 3.98 
~ F at .01 level = 7.20 
Step 7. Correlation and Regression 
400 : 400 
Ttotal = /403 X 668 =.70 . Бош = 493 = 81 
2 = 22 b = 2 = .40 ы. 
Ташов RUE UM мищ" 
375 375 
Twithin 51749052458 = .84 bsitin = 430 ^ 87 
Step 8. Calculation of Adjusted Y-Means 
Groups N Mx My My x (adjusted) 
E 5 12 22 223 
B 5 15 16 13.7 
[9] 5 10 13 15.0 
General Means 123 17 17.0 


* 1 df lost, see page 294. 
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My x = My — b(Mx — GMx) 
For Group A: My — bz = 22 — .87(12 — 123) = 223 
В: My — bz = 16 — .87(15 — 12.3) = 13.7 
C: My — bz = 13 — 87(10 — 12.3) = 15.0 


Step 9. Significance of differences among adjusted Y-Means 
SD,.. = V/12 = 3.46 
SEu,. = VE = 1.55 


SE between any two adjusted means = 50, JE + 
71 


= 346. Hm 3.46 X .63 = 2.18 (71) 
For df = 11, t.o5 = 2.20; t9 = 3.11 (Table D) 


Significant difference at .05 level = 2.20 X 2.18 = 4.80 
Significant difference at .01 level = 3.11 X 2.18 = 6.78 
A differs significantly from both В and C at .01 level. 
B and C are not significantly different. 


We thus have three groups—two experimental and one control— 
with initial scores (X) and final scores (Y). The problem is to deter- 
mine whether the groups differ in the final trial (Y) as a result of the 
incentives. The method permits us to determine whether initial 
differences in (X) are important and to allow for them if they are. 


_ Table 39 gives the necessary computations. The following steps 
outline the procedure. 


Step 1 


Correction term (C). There are three correction terms to be 
applied to SS's, one for X, one for Y and one for the cross products 
in X and Y. Calculation of C, and С, follows the method of page 


275. The formula for Cay is EX X EY oy in our problem 19922. 


N 15 


Step 2 А 


SS for totals. Again we have three 5576 for totals: 88,, SSv and 
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SS,,, of which only S8,, is new. The formula for SS is 


58, = ХХҮ — Сш (72) 


(sum of squares for xy in analysis of covariance) 


Тһе SS, is found by multiplying pairs of X and Y scores, sum- 
ming over the range and subtracting С: thus (15 X 30 + 10 X 20 
+...+10 X 10) — 3145 = 400. 


Step 3 


SS among means of the three groups. Calculations shown in Table 
89 follow the method of page 289 for X and Y. The “among means” 
term for zy is the sum of the corresponding X and Y column totals 
(e.g., 60 X 110 -+75 X 80-+ 50 X 65) divided by 5 and minus Cy. 


Step 4 


SS within groups. For x, y, and ту these SS's are found by sub- 
tracting the “among means" 59° from the 88у. 


Step 5 


A preliminary analysis of variance of the X and Y trials, taken 
Separately, has been made in Table 39. The F test applied to the 
initial (X ) scores (Е, = .88) falls far short of significance at the .05 
level, from which it is clear that the X-means do not differ signifi- 
cantly and that the random assignment of subjects to the three 
groups was quite successful. The F-test applied to the final (Y) 
Scores (F, = 2.75) approaches closer to significance, but is still con- 
siderably below 3.88, the .05 level. From this preliminary analysis 
of variance of the Y-means alone we must conclude that neither 
Praise nor scolding is more effective in raising scores than is mere 
repetition of the test. 4 


Зер 6 
The computations carried out in this step are for the purpose of 


Correcting the final (Y) scores for differences in initial (X) scores. 
The symbol 55, means that the SS, have been “adjusted” for any 


Ye ty 
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variability in Y contributed by X, or that the variability in X is held 
constant. The general formula (see p. 297) is 


SS,- = SS, — S (73) 


(SS in y when variability contributed by x has been removed 
or held constant) 


2 
For S8; we have that SS,, = 668 — x 


that S8, = 458 — 875)? 


ог 343; for SSwitnin 
— 131. The SS for among means is the ad- 


justed SS minus adjusted 55,1... This last 88, „ cannot readily be 
caleulated directly.* 


From the various adjusted sums of squares the variances (MSy.z) 
can now be computed by dividing each SS by its appropriate df. 
Owing to the restriction imposed by the use of formula (73) (reduc- 
tion of variability in X) 1 df is lost and the analysis of covariance 
(Table 39) shows only 11 df for within groups instead of 12, and 
only 13 instead of 14 for total. 

The value of analysis of covariance becomes apparent in Table 39 
when the F-test is applied to the adjusted among and within vari- 
ances. Fy, = 106/12 or 8.83, and is highly significant —far beyond 
the .01 level (.01 = 7.20). This Е, „ should now be compared with 
the F, of 2.75 (p. 290) obtained before correcting for variability in 
initial (X) scores. It is clear from F,» that the three final means— 
whieh depend upon the three incentives—differ significantly after 
they have been adjusted for initial differences in X. To find which 
of the three possible differences is significant or whether all are 
significant we must apply the t-test (in Step 9). 


Step 7 


An additional step is useful, however, before we proceed to the 
t-test for adjusted means. From the 855 in z, y and zy it is pos- 
sible to compute several coefficients of correlation. These are helpful 
in the interpretation of the result obtained in Step 6. The general 


; Ут 
formula used is т = ae (p. 139); it may be applied to the 


appropriate SS’s for total, among means and within groups. 


*See McNemar, Q. op. cit., p. 324. 
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The within groups correlation of .84 is a better measure of the 
relationship between initial (X) and final (Y) scores than is the 
total correlation of .70, as systematic differences in means have 
been eliminated from the within r. It is this high correlation between 
X and Ү which accounts for the marked significance among Y-means 
when the variability in Х is held constant. High correlation within 
groups reduces the denominator of the variance ratio, F,,., while low 
correlation between X and Y means (namely, .22) does not propor- 
tionally affect the numerator. Thus we note that the within groups 
variance of 38.2 is reduced through analysis of covariance to 12, 
while the among means variance is virtually unchanged (from 105 
to 106). When correlation among scores is high and correlation 
among means low (as here), analysis of covariance will often lead to 
а significant F when analysis of variance fails to reveal significant 
differences among the Y-means. These two 78 may be used, there- 
fore, in a preliminary way to decide whether analysis of covariance 
is worth while. 

Regression coefficients for total, among means and within groups 
have been calculated by use of the formula b = zu (p. 297). The 
bwitnin is the most nearly unbiased estimate of the regression of X 
оп Y, since any systematic influence due to differences among means 
has been removed. Therefore, bwitnin is used in the computation of 
the adjusted Y-means in Step 8. 


Step 8 


Y-means can be adjusted directly for differences in the X-means 
by use of the formula Муу = My — b(Mx — Gen.Mx) * in which 
the regression coefficient, Б, is the bwitnin 01.87. My is the original or 
uncorrected Y-mean of a group; Mx is the corresponding X-mean of 
4 group and Gen.M x is the .nean of all X scores. It will be noted 
that the B and C means receive more correction than the A mean 
Which is only slightly changed. 

Fz tells us, it must be remembered (p. 294), that at least one of 
9ur adjusted Y-means differs significantly from one other mean. To 
determine which mean differences are significant we must first com- 
pute the adjusted Y-means and then test these differences by the 
t-test, 


* See — bz = adjusted value of y, or Mr — bz = Мух. Substi 
p. 292. ba = adjuste y. у.х. Substitute 
а= (Mx — Gen Ma) to give Mra- My — b(Mx— Gen Mp). 
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Step 9 


The Variance,, is 12 (Table 39) as compared with the Variance, 
of 38.2 and the SD, is \/12 or 3.46. From formula (71) we find 
that the standard error of the difference between any two means is 
2.18. For 11 df, t is 2.20 at the .05 and 3.11 at the .01 level. Sub- 
stituting for to; and SE, in the equation £ — D/SEp, we obtain 
significant differences at the .05 level and .01 level of 4.80 and 6.78, 
respectively. It is clear by reference to Step 8 that the adjusted А 
mean is significantly higher than the B and C means (at the .01 
level) but that B and C do not differ significantly. We may con- 
clude, therefore, that when initial differences are allowed for, praise 
makes for significant changes in final score, but that scolding has 
no greater effect than mere repetition of the test. Neither of these 
last two factors makes for significant changes in test score. 


Appendix to Chapter 10 
(a) Calculation 58,, [Example (1), p. 274] 


Columns 
A: [642-+722+. ..4.952) — a" = 890 


ВЕ aM + 672] — CER — 944 

Ce M +872] Om = 454 

Dk ge +77] — DES = 302 
EE: Ce .. S +767] -EW = 710 

Е [о My чыр ЗЫ: 
““ Е 6 

Gr иза Ld 4-822] — GET — 1540 

Н: (Б... E: oe] — BD = 338 


5666 
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(b) Derivation of the formula 
Sa)? 
88,, — 88, — SW 
Let X — independent variable 
Y — dependent variable 
Tsy = correlation between X and Y 


1 Then Oya = 02,(1— Tay) = 02, — o*yr?;, p. 162 
z (бху)? 
| е: Ту — Ух. Sy? p. 139 
| ° 
Substituting, | cha ot — mE 
] In terms of SS: 55,, = SS, — (GR 
j SS, 
қ (с) Derivation of formula 
Хту 
2n DI 512 3 
| 3 
b= Dos p. 155 
Е. Уту 
aN 020, 
ERES ола > УЙ 
Substituting b = Non = уура 
N “ 
= 220 
~ Sx? * 


r 
A 
"4 
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PROBLEMS 


1. Ina learning experiment, 10 subjects are assigned at random to each of 
six groups. Each group performs the same task but under slightly 
different experimental conditions. Do the groups differ in mean per- 
formance? 


1 2 3 4 5 6 
41 40 36 14 41 55 
40 36 33 38 35 36 
39 40 29 51 52 41 


41 39 39 36 10 36 

36 36 33 36 44 42 

35 34 32 32 26 42 

35 41 34 38 54 34 

37 37 34 36 30 40 Grand sum 
Sums 384 371 345 358 367 410 2235 


. Solve problem (2), page 243, by the methods of analysis of variance. 


- Twenty subjects are paired on the basis of their initial scores on a test. 
Ten (one member of each pair) are then assigned to an experimental and 
10 to a control group. The experimental group is given special practice 
and both groups are retested. Data for final scores are as follows: 


Pairs of Subjects 
1 2 3 4.5.0.7 8 9 30 Total 
Control group 25 46 03 45 15 64 47 56 73 66 530 


Experimental group 36 57 80 67 19 78 46 59 69 70 590 


(a) Do the groups differ significantly in mean performance? 
(b) Do subject-pairs differ significantly ? 
(c) Check the result in (a) by taking the difference between pairs of 


scores, and testing the mean difference (by t-test) against null 
hypothesis. 


. In the following table * the entries represent blood cholesterol readings 
taken from 18 patients in April and in May. 
(a) Is the rise from April to May significant? 
(b) Are there significant individual differences, regardless of month? 


* Fertig, John W., "The Use of Interaction in the Removal of Correlated Vari- 
ation.” Biometric Bull., 1936, 1, 1-14. 


ұл, a iai 


NX SY 
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(c) From the column of differences, compute Mp and SDp. Using the 
t-test, measure the significance of Mp against the null hypothesis. 
Compare with the result in (а). 


Individual April May Difference Sum 
1 158.0 190.5 32.5 348.5 
2 158.5 177.0 18.5 335.5 
3 137.5 172.0 34.5 309.5 
4 145.5 152.5 7.0 298.0 
5 130.5 147.0 16.5 277.5 
6 141.0 127.0 —140 268.0 

< 7 150.5 149.5 - 10 300.0 
8 142.5 152.5 10.0 295.0 
9 148.0 147.0 - 10 295.0 

10 137.5 130.5 — 7.0 268.0 
11 137.0 133.0 — 40 270.0 
12 160.0 145.5 —145 305.5 
13 145.0 124.5 —205 ` 269.5 
14 149.5 156.0 6.5 305.5 
15 145.0 143.5 — 15 288.5 
16 132.5 146.0 13.5 278.5 
17 139.0 148.0 9.0 287.0 
18 151.0 161.0 10.0 312.0 

Sum 2608.5 2703.0 94.5 5311.5 

SS 37028825 14108720 481125 1576009.25 


5. In an experiment by Mowrer,* previously unrotated pigeons were tested 
for clockwise postrotational nystagmus. The rate of rotation was one 
revolution in 14 sec. An average initial score for each pigeon based upon 
2 tests is indicated by the symbol X. The 24 pigeons were then divided 
into 4 groups of 6 each. Each group was then subjected to 10 daily 
periods of rotation under one of the experimental conditions indicated 
below. The rotation speed was the same as during the initial test and the 
rotation periods lasted 30 вес.) with a 30-sec. rest interval between each 
period. Groups 1, 2 and 3 were practiced in a clockwise direction only. 
For Group 4 the environment was rotated in a counterclockwise direc- 
tion. At the end of 24 days of practice, each group was tested again 
under the same conditions as on the initial test. These records are 
called Y. 


* From Edwards, A. L., Ezperimental Design in Psychological Research (New 
York: Rinehart, 1950), p. 357. 
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Group 1 Group 2 Group 3 Group 4 

Rotation of body Rotation of body Rotation of body Rotation of 

only. Vision only. Vision and environ- environment 
excluded permitted ment only 

Initial Final Initial Final Initial Final Initial Final 

X Y X x X y X Y 
23.8 79 28.5 25.1 27.5 20.1 22.9 19.9 
23.8 71 18.5 20.7 28.1 IT 25.2 28.2 


22.6 EN 20.3 20.3 35.7 16.8 20.8 18.1 
22.8 11.2 26.6 18.9 13.5 13.5 277 80.5 
22.0 64 21.2 25.4 25.9 21.0 19.1 19.3 
19.6 10.0 240 30.0 27.9 29.3 32.2 35.1 


134.6 50.3 139.1 1404 1586 1184 1479 1511 


(a) Test the significance of the differences among X-means. (Com- 
pute the among groups and within groups variance and use F-test.) 

(b) Do same as in (a) for the Y-scores. 

(c) By analysis of covariance test the differences among the adjusted 
means in Y. How much is the variance among Y-means reduced 
when X is held constant? 

(d) Compute the adjusted Y-means, Му. х by the method of p. 202. 

(e) From the t-test find that difference among adjusted Y-means which 
is significant at the .05 level; at the .01 level. 


ANSWERS 


1. No F= 50.5 or 93, and differences among means may be cttributed 


entirely to sampling fluctuations, 
2. F= 5.16 and t =23 (VE) 


E 
8. (а) Хо. F= 352 = 5.10 
911.8 
b) Yes. P = = 25 
(b) Yes f 353 5.83 
ree 21 p eN 
(0) t= 315—220 P= Р 519 
248.0 
& (s) No. F тозу ^ 221. df = 1/17 and Fog = 445 (Table E) 


ч 255.19 
(b) Yes, just barely. Е= = 227 df = 17/17 and F o = 2.28 


(c) Mp = 525; SEp = 353. { — 929 149; р 2999, qp IT 


3.53 
5. (a) Difference among X-means not Significant. P,— IE. 81 


AHE 
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341.4 


(b) Y-means differ significantly. F, — 919 = 13.7. For а) of 3/20, 
Р оу = 4.94. 
303.4 Р 
(a) sies ТОБ = 15.3. Variance among Y-means is reduced 11%— 


from 341.4 to 303.4. 
(d) 9.3, 23.9, 18.6 and 24.9 
(e) 5.81; 7.26 


12 


THE SCALING OF MENTAL TESTS AND OTHER 
PSYCHOLOGICAL DATA | 


+ 


Various devices, many of them based upon the normal probability 
curve, have been used in the scaling of psychological and educational 
data. As used in mental measurement, a scale may be thought of as 
a continuum or continuity along which items, tasks, problems and the 

like have been located in terms of difficulty or some other attribute. 

2 (The units of a scale are arbitrary and depend upon the method em- 

[уез by the investigator. Ideally, scale units should be equal, have. 

the same meaning, and remain stable throughout the scale. Several 
scaling procedures will be described in this chapter. 


I. The Scaling of Test Items 
1. Scaling individual test items in terms of difficulty (c-scaling) 


We sometimes wish to construct a test which shall contain prob- 
lems or tasks graded in difficulty from very easy to very hard by 
known steps or intervals. If we know what proportion of a large 
group is able to solve each problem, it is comparatively easy to 
arrange our items in a percentage order of difficulty. Such an ar- 
rangement constitutes a scale, to be sure, but а crude one, as per- 
centage differences are not satisfactory indices of differences in 
difficulty (p. 314). 

If we are justified in assuming normality in the trait being meas- 
ured, the variability (i.e., с) of the group will give us a better scaling 
unit than will percentage passing (p. 315). Test items may be “set” 
or spaced in terms of o-difficulty at definite points along a difficulty. 
302 
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continuum; their positions with respect to each other as well as with 
respect to some reference point or “zero” is then known in terms of 
a stable unit. To illustrate o-scaling, suppose that we wish to con- 
struct a scale for measuring “reasoning ability” (e.g., by means of 
syllogisms) in 12-year-olds; or a scale for measuring mechanical 
ingenuity in high-school juniors; or a scale for determining degree of 
suggestibility in college-freshmen. The steps in constructing such а 
device may be outlined briefly as follows: 


(1) Compile a large number of problems ог. other test items. These 
items should vary in difficulty from very easy to very hard and 
all sample the behavior to be tested. 

Administer the items. to a large group drawn randomly from 
those for whom the final test is intended. 

Compute the percentage of the group which can solve each 
problem. Discardduplicate items and those too easy or too hard 
or unsatisfactory for other reasons.* Arrange the problems re- 
. tained in an order of percentage difficulty. An item done cor- 
` rectly by 90% of the group is obviously less difficult than one 
selved by 4595; while the second problem is less difficult than 
one solved By only 50%. The larger the per cent passing, the 
lower the item iri a scale of difficulty. 

By means of Table А convert the per cent solving each problem 
into a o-distance above or below the mean. For example: an 
item done correctly by 4096 of the group is 1096 or .25c above 
the mean. A problem solved by 7876 is 28% (7896 — 50%) or 
776 below the mean. We may tabulate the results for 5 items, 
taken at random, as follows (see Fig. 50) : 


(2 


— 


(3) 


— 


(4 


— 


Problems AC B с D E 
Per cent solving: 93 78 55 40 14 
Distance from the mean, ` 
` in percentage terms: —43 —98. —5 10 . 36 
Distance from the mean е 
іп o-terms: —148 —.77 —.13 25 108 


Problem А is solved by 93% of the group, i.e., by the upper 50% 
(the right half of the normal curve) plus the 43% to the left of 
the mean. This puts Problem A at a point —1.480 from the 
mean, In the same way, the percentage distance of each prob- 


* Adkins, D. C., et aL, Construction and Analysis of Achievement Tests 
(Washington, D. C.: U. S. Government Printing Office, 1947), Chap. II. 
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Леш from the mean (measured in the plus or minus direction) 1 
сап be found by subtracting the per cent passing from 50%. 
From these percentages, the o-distance of the problem above or 
below the mean is read from Table A. 


(5) When the o-distance of each item has been established, calculate 
the o-distance of each item from the zero point of ability in 
the trait. A zero point may be located as follows: Suppose that 
576 of the entire group fail to solve a single problem. This 
would put the level of zero ability 45% of the distribution below 
the mean, or at a distance of —1.65c from the mean.* The 
o-value of each item in the scale may then be computed from 
this zero. To illustrate with the 5 problems above: 


. Problems А ap с D Е 
o-distance from mean: АВЕ Е Т 29 ELOS 
o-distance from arbitrary e 

zero, —1.65 17 88 152 190 273 


The simplest way to find o-distances from a given zero is to sub- 
tract the zero point algebraically from the o-distance of each item 
from the mean. Problem А, for example, is —1.48 — (—1.65) °F 

, я i ? Е 
* This is, of course, an arbitrary, not a true zero. It will serve, however, 83 а) 


reference point (level of minimum ability) from which to measure perform- 
ance. The point —3.00c is often taken as a convenient reference point. 


7% 


` 
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176 from the arbitrary zero; and Problem E is 1.08 — (-1.65) 
or 2.736 from our zero. 

When the distance of each item from the given zero has been 
determined, the difficulty value of each item with respect to the 
other items and with respect to zero is known and the scaling is 
finished. The next steps depend upon the purpose of the investi- 
gator. He may select items separated by fixed o-distances (50, 
say) to cover a wide range of talent. Or he may limit the range 
of talent from —2.506 to 2.500, say, and not attempt to establish 
equal difficulty steps. Norms are derived.from the final scale for 
age, grade, occupational or other groups. 


(6 


E 


2. Scaling total scores on a test 


In the last section we saw how individual test items can be scaled 
in o-units by assuming normality in the trait being measured. We 
shall now describe two methods of scaling score totals or aggregates 
of items—procedures generally followed in constructing aptitude and 
achievement tests. 


(1) G-SCORES AND STANDARD SCORES 


Let us suppose that the mean of a test is 122 and the о is 24. Then 
if John earns a score of 146 on this test, his deviation from the mean 
is 146 — 122 or 24. Dividing John's deviation of 24 by the c of the 
test, we give him a o-score of 24/24 or 1.00. If William's score is 
110 on this test, his deviation from the mean is 110 — 122 or —12; 
and his score in c-units is —.5. Deviations from the mean expressed 
in o-terms are called o-scores, 2-scores, and reduced scores. Of these 
designations, o-score is certainly the most descriptive, but the other 
terms are often used. We have already used the concept of a o-score 
in the problems in Chapter 5, р. 104. 

The mean of a set of o-scores.is always 0 (the reference point) 
and the c is always unity or 1.00. As approximately half of the scores 
in a distribution will lie below and half above the mean, about half 
of our c-scores will be negative and half positive. In addition, 
G-scores are often small decimal fractions and hence somewhat awk- 
ward to deal with in computation. For these reasons, o-scores are 


‘usually converted into a new ау with M and с so selected as 


to make all scores positive and relatively easy to handle. Such scores 
are called standard scores Raw test scores of the Army General 
Classification Test, for example, are expressed as standard scores in 


306 + STATISTICS ІМ PSYCHOLOGY AND EDUCATION 


a distribution of M = 100 and c = 20; sub-tests of the Wechsler- 
Bellevue are converted into standard scores in a distribution of. 
M = 10 and c = 3; and the tests of the Graduate Record Examina- 
tion into standard scores in a distribution of M = 500 and о = 100. 

"The shift from raw to standard score requires a linear transforma- 
tion.* This transmutation does not change the shape of the distribu- 
tion in any way; if the original distribution was skewed (or normal), 
the standard score distribution will be skewed or normal in exactly 
the same fashion. The formula for conversion of raw to standard 
score is as follows: 


Let.X — a score in the original distribution 
X" — a standard score in the new distribution 


M and M' — means of the raw score and standard score dis- 
— tributions 
о and о” = SD's of raw and standard scores 
X'—M' x—M 
Then —— — 


o с 


or X’ = Tox — M) 4-M* EC 


(formula for converting raw scores to standard scores) ^ 
' An illustration will show how tlie formula works. 


. H E , т 
Example (1) Given а distribution with Mean = 86 and o = 15. 
Tom’s score is 91 and Mary’s 83. Express these raw scores as 
standard scores in a distribution with a mean of 500 and o of 100. 


By formula (74) 
100 
х= 15 (Х — 86) +500 


Substituting "Тош score of 91 for X we have 
X’ = 6.67(91 — 86) +500 
= 533 e 
Substituting Mary’s score of 83 for X, 


X’ = 6.67(83 — 86) -+ 500 
= 480 


. * When the equation connecting two variables, y and т, is that of a straight 
line, changing 28 into 1/8 involves a linear transformation. (Formula (74) 18 
the equation of a straight line, analogous to the general equation of a straig 
line, y = mz +d. 


4 


| 
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In a distribution with a mean of 10 and a o of 3, Tom’s standard 
score would be 11 and Mary’s 9.4; in a distribution with a mean of 
100 and a c of 20, Tom's standard score would be 107 and Mary's 96. 
Other sealing distributions may, of course, be employed. 

Scores made by the same individual upon several tests cannot 
usually be compared directly owing to differences in test units. Thus 
а score of 162 on a group intelligence test and a score of 126 оп ап 
educational achievement examination cannot be compared meaning- 
fully. If scores like these are expressed as standard scores, however, | 


` they сап be compared provided the distributions of raw scores are of | 


the same form. Fortunately, most distributions of scores are so 
nearly bell-shaped (p. 113) that no great error is made in treating 
them as normal. When we сап assume normality, a score of 1.00с 
on a mechanical aptitude test and a score of 1.000 on a test of 
mechanical interests represent the same relative degree of achieve- 
ment: both are exceeded by approximately 16% of those taking the 
two tests (Table A). A problem will illustrate further this important 
aspect of standard scores. қ 


Example (2) Given а reading test with a mean of 81 and o of 
12; and an arithmetic test with a mean of 43 and a c of 8. Sue’s 
score is 72 in reading and 27 in arithmetic, Assuming the distribu- 
tions of reading and arithmetic Scores to be of the same form 
(approximately normal), convert Sue’s scores into a standard score 
distribution with Mean = 100 and с = 20 and compare them. # 


In the reading test EINE is 9 below the mean of 81. Hence, her 
Score is at —.75с(—9/12) and her new score is 8100 = .75 X 20). 
Іп arithmetic Sue’s score is 6 points below the mean; "again her 
score is at —.75e and her new score 85(100 — .75 X 20). Sue's 
two standard scores are comparable, and are also equivalent (repre- 
sent same degree of achievement), if our assumption of normality of 
distributions is tenable. М 


(2) NORMALIZING THBPREQUENCY DISTRIBUTION; THE T'-SCALE 

Instead of into standard scores, the raw scores of a frequency 
distribution may be converted into a system of “normalized” stand- 
ard scores by transforming them into equivalent points in a nor- 
mal distribution. Equivalent scores (p. 306) are measures which 
indicate the same level of talent. Suppose that, in a certain test, 
20% of the group achieve scores better than 73. Now from Table A 
We find that 20% of the area of the normal probability curve lies 
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above .84c (30% falls between the mean and .840). Hence score 73 С 
is equivalent to .840 in the normal distribution, as both reflect. the 
same degree of achievement, 

Normalized standard scores are generally called T-scores. T'-scal- 

- ing was devised by McCall * and first used by him in the construction 
of a series of reading tests designed for use in the elementary grades. 
The original T-scale was based upon the reading scores achieved by 
500 12-year-olds; and the'scores earned by other age groups on the 
same reading test were expressed in terms of 12-year-old perform- 
ance. Since this first use of the method, T-scaling has been employed 
with various groups and with different tests so that it no longer has 4 
Teference specifically to 12-year-olds nor to reading tests. 

T-scores are normalized standard Scores converted into a distribu- ` 
tion with a mean of 50 and c of 10. In the o-sealing of individual 
items, the mean, as we know, is at zero and с is 1.00. The point of | | 
reference, therefore, is zero and the unit of measurement is 1. If the | 
point of reference is moved from the mean of the normal curve to а 
point 5 о below the mean, this new reference point becomes zero in 
the scale and the mean is 5. As shown in Figure 51, the o-divisions 
above the mean (10, 2c, 39, 40, 5с) become 6, 7, 8, 9 and 10; and the J 
9-divisions below the mean (—19, —20, —30, —4в, —50) are 4,3,2,1 < 
and 0. The c of the distribution remains, of course, equal to 1.00. 


ез, Uc: MEE; 1! з 4 "5 
9- Scale. Zero Point at Mean П F 
Ж 1 
о QE ys ee loni. ЕТТЕН 
T- Scole, Zero Point at -57 | A | 
| 
PR rene | абу ты әлі 30 — 100 


T- Scale Zero Point at 255 


FIG. 51 To illustrate 9-scaling and T-scaling in a normal distribution 
* McCall, William А., Measurement (New York: Macmillan, 1939), Chap. 22. ` 


К 
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Only slight changes are needed in order to convert this o-scale into 
а T-scale. The T-scale begins at —5o and ends at +50. But c is 
multiplied by 10 so that the mean is 50 and the other divisions are 
0, 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100. Тһе relationship of the 
T-scale to the ordinary o-scale is shown in Figure 51. Note that the 
T-scale ranges from 0 to 100; that its unit, i.e., Т, is 1 and that the 
mean is 50. Т, of course, equals .1 of о which is equal to 10. The ref- 
erence point on the T-scale is set at —5o in order to have the scale 
cover exactly 100 units. This is convenient but it puts the extremes of 
the scale far beyond the ability ranges of most groups. In actual prac- 


` tice, T-scores range from about 15 to 85, i.e., from —3.50 to 3.50. 


"The procedure to be followed in T-scaling a set of scores can best 
be shown by an example. We shall outline the process in a series of 
steps, illustrating each step by reference to the data of Table 40. 


TABLE 40 To illustrate the calculation of T-scores 


а) (2) (3) (4) (5) (6) 
Cum. Freq. below 
Ced Cum. Score + оп Col. (4) T-Scores 
"er f Í Given Score in %'s 
vr 
10 1 ee 615 992 74 
9 4 L v8 59 953 07 
8 6 5 57 54 87.1 61 
fi 10 $ 51 46 742 56 
6 8 € 4 87 59.7 52 
2 18. "C$. 79% 26.5 21 48 
1 20° 1 17. 
3 S rtis 16 29 
т Ee а 
N=62 


(1) Compile a large and representative group of test items which 
vary in difficulty from easy to hard. Administer these items to a 
sample of subjects (children or adults) for whom the final scale 
is intended. ; ” 
Compute the per cent passing each item. Arrange the items іп 
an order of difficulty in terms of these percentages. 

Administer the test to a representative sample and tabulate the 
distribution of total scores. Total scores may now be scaled as 
shown in Table 40 for 62 subjects. In column (1) the test scores 
are entered; and in column (2) are listed the frequencies—num- 
ber of subjects achieving each score. Two subjects had scores of 
3, 18 had scores of 4, 13 scores of 5, and so on. In column (3) 
scores have been cumulated (p. 63) from the low to the high 


е ш 
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end of the frequency distribution. Column (4) shows the num- 
ber of subjects who fall below each score plus one-half of those 
who earn the given score. The entries in this column may readily 
be computed from columns (2) and (3). There are no scores 
below 3 and 2 scores оп 3, so that the number below 8 plus one- 
half on З equals 1. There аге 2 scores below 4 [see column (3) 1 
and 18 on 4 [column (2)]; hence the number of scores below 
4 plus one-half on 4 is 2--9 or 11. There are 20 scores below 
5 (2--18) and 13 scores on 5 [column (2)] so that the number 
below 5 plus one-half on 5 is 20 + 6:5 or 26.5. The reason why 
one-half of the frequency on a given score must be added to the 
frequency falling below that score is that each score is an 
interval—not a point on the scale. The score of 4, for example, 
covers the interval 3.5-4.5, midpoint 4.0. If the 18 frequen- 
cies on score 4 are thought of as distributed evenly over the inter- 
val, 9 will lie below and 9 above 4.0, the midpoint. Hence, if we 
add 9 to the 2 scores below 4 (ie., below 3.5) we obtain 11.as 
the number of scores below 4.0, the midpoint of the interval 
3.5-4.5. Each sum in column (4) is taken up to the midpoint 
of а score-interval. 

Іп column (5) the entries in column (4) are expressed as per 

cents of N (here 62): Thus, 99.2% of the scores lie below 10.0 

midpoint of the interval 9.5-10.5; 95.2% of the scores lie below 

9.0, midpoint of 8.5-9.5, ete. 

(5) Turn the per cents in column (5) into T-scores by means of 
Table G. T-scores in Table G corresponding to percentages 
nearest to those wanted are taken without interpolation, as frac- 

» tional T-scores are a needless refinement. Thus for 1.6% we 
take 1.79 (T-score = 29); for 17.7% we take 18.41% (T-score 
= 41), and во on. 

юш Table G, percentages lying to the left of (i.e., below) succeed- 

Ing o-points expressed as Т-всогев have been tabulated, rather than 

per cents between the mean and given o-points as in Table A. In 

Table С, we are enabled, therefore, to read T-scores directly; but the 

student will note that T-scores can also be read from Table A. To 

illustrate with score 8 in Table 40, which has a percentage-below-plus 
one-half-reaching of 87.1, note that a score failed by 87.1% lies 

37.1% (87.1% — 50.0%) to the right of the mean. From Table А; 

we read that 37.1% of the distribution lies between the mean and 

1.130. Since the c of the T-scale is 10, 1.136 becomes 11 in T-units; 
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ana adding 11 to 50, the mean, we get 61 as the required T-score (see 
ig. 51). 


а У 
FIG. 52 Histogram of ће sixty-two scores іп Table 40 


67 74 


48 52 56 61 
40 
3 4 567 8 9 0 


FIG. 53 Normalized distribution of the scores in Table 40 and 
Figure 52. Original scores and T-score equivalents are shown 
on baseline 


Figure 52 shows a histogram plotted from the distribution of 62 
scores in Table 40. Note that the scores of 3, 4, 5, etc., are spaced at 
equal intervals along the baseline, ie., along the scale of scores. 
When these raw scores are transformed into normalized standard 
Scores—into T-scores—they occupy the positions in the normal curve 
shown in Figure 53. The unequal scale distances between the scores 
in Figure 53 show clearly that, when normality is forced upon a 
trait, the original scores do not represent equal difficulty steps. In 
other words, normalizing a distribution of test scores alters the orig- 
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inal test units (stretching them out or compressing them) and the % 
more skewed the raw score distribution, the greater is the change 
in unit. 

T-scores have general applicability, a convenient unit, and cover 
a wide range of talent. Besides these advantages, T-scores from dif- 
ferent tests are comparable and have the same meaning, since refer- 
ence is always to a standard scale of 100 units based upon the normal 
probability curve.. T-scaling forces normality upon the scores of a 
frequency distribution and is unwarranted if the distribution of the 
trait in the population is not normal. For the distributions of most 


mental abilities in the population, however, normality is a reason- 4 | 


able—and is often the only feasible—assumption. 


(3) A COMPARISON OF T-SCORES AND STANDARD SCORES 


T-scores are sometimes confused with standard scores, but the 
assumptions underlying the two sorts of measures are quite different. 
"Table 41 repeats the data of Table 40, and shows the Т-всоге equiva- 


TABLE 41 Comparison of T-scores and standard scores 


(Data from Table 40) 


' та бсогев 

Score f T-Scores pops BÉ 10 
10 1 74 75 
9 4 67 69 
8 6 61 63 
7 10 56 57 
6 8 52 52 
5 13 48 46 
- 4 18 41 40 
3 2 28 34 

N=62 Equation for converting test 


scores into standard scores (see р. 306) 
For test scores: 


M-5m х-ыз X'-m 
o=172 3 [y 25 10 
10Х 57.3 
xX = —-—-— + 50 
5 1.72 1.72 "s 


X'—582X — 333 + 50 
Х' = 582Х + 16.7 


| 


lents to the given raw scores. Standard scores with a mean of 50 and 4 


c of 10 are listed in column (4) for comparison with the T-score?: 
These standard scores were calculated by means of formula (74) 0” 
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' page 306. The mean of the raw scores is 5.73 and the o is 1.72; and 
the mean of the “new” standard score distribution is, of course, 50, 
with o of 10. Substituting these values in formula (74) we have 


X’ = 5.82X + 16.7 


as our transformation equation. Putting 3, 4, 5, etc., for X in this 
equation we find X"'s of 34, 40, 46, etc. These X" scores will be found 
to correspond fairly closely to the T-scores. This is often the case, 
and the more nearly normal the distribution of raw scores the closer 
the correspondence. The two kinds of scores are not interchangeable, 
however. With respect to original scores, T-scores represent equiva- 
lent scores in a normal distribution. Standard scores, on the other 
hand, always have the same form of distribution as raw scores, and 
are simply original scores expressed in o-units. Standard scores rep- 
resent the kind of conversion we make when we change inches to 
centimeters or kilograms to pounds; that is, the transformation is 
linear. Standard scores correspond exactly to Т-всогез when the dis- 
tribution of raw scores is strictly normal. 


(4) PERCENTILE SCALING 


A child who earns a certain score on a test can be assigned a per- 
centile rank (PR) * of 27, 42 ог 77, say, depending upon his position 
in the score distribution. Percentile rank locates a child on a scale 
of 100, and tells us immediately what proportion of the group has 
achieved scores lower than he. Moreover, when a child has taken 
several tests, a comparison of his PR’s provides measures of relative 
achievement, which may be combined into a final total score. As a 
method of scaling test scores, PR’s have the practical advantage of 
being readily calculated and easily understood. But the percentile 
scale also possesses marked disadvantages which limit its usefulness. 
- Percentile scales assume that the difference between a rank of 10 
and a rank of 20 is the same as the difference between a rank of 40 
and a rank of 50, namely, that percentile differences are equal 
throughout the scale. This assumption of equal percentile units holds 
strictly only when the distribution of scores is rectangular in shape; 
it does not hold when the distribution is bell-shaped, or approxi- 
mately normal. Figure 54 shows-graphically why this is true. In the 
diagram we have a rectangular distribution and a normal curve of 
the same area plotted over it. When the rectangle is divided into 5 
equal segments, the areas of the small rectangles are all the same 


* For method of computing PR’s, see p. 68. 
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0 20 40 60 80 100 
FIG. 54 To illustrate the position of the same five percentiles 
in rectangular and normal distributions 


(2076) and the distances from 0 to 20, 20 to 40, 40 to 60, 60 to 80, and 
80 to 100 are all equal. These percentiles, Pop, P4o, ete., have been 
marked off along the top of the rectangle, 

Now let us compare the distances along the baseline of the normal 
curve when these are determined by successive 20% slices of area. 
These baseline intervals can be found in the following way. From 
Table A we read that the 80% of area to the left of the mean extends 
to —.84c. The first 20% of a normal distribution, therefore, falls 
between —3.00c and —.840: covers a distance of 2.160 along the 
baseline. The second 20% (Ps, to Ріө) lies between —.840 and —.256 
(since —.250 is at a distance of 10% from the mean) ; and covers a 
distance of .59о along the baseline. The third 20% (Рі to Poo) lies 
between —.25с and .256: straddles the mean and covers .506 on the 
baseline. The fourth and fifth 20% occupy the same relative posi- 

` tions in the upper half of the curve as the second and first 20%% 
occupy in the lower half of the curve. To summarize: 


First 20% of area covers a distance of 2.166 
Second 20% of area covers a distance of 506 
Third 20% of area covers a distance of .506 
Fourth 20% of area covers a distance of 596 
Fifth 20% of area covers a distance of 2,166 
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It is clear (1) that intervals along the baseline from the extreme 
left end (0 to Poo, Pao to Pio, ete.) to the extreme right end of the 
normal curve are not equal when determined by successive 20% 
slices of area; and (2) that inequalities are relatively greater at the 
two ends of the distribution, so that the two end fifths are 4 times as 
long as the middle one. 

Distributions of raw scores are rarely if ever rectangular in form. 
Hence equal per cents of N (area) cannot be taken to represent equal 
increments of achievement and the percentile scale does not progress 
by equal steps. Between ©, and Qs, however, equal per cents of area 
are more nearly equally spaced along the baseline (see Fig. 54), so 
that the PR’s of a child in two or more tests may be safely combined 
or averaged if they fall within these limits. But high and low PR’s 
(above 75 and below 25) should be combined, if at all, with full 
knowledge of their limitations. 


TABLE 42 Percentile distributions for nine-year-olds оп. three tests 


Method of Combining the Percentile Ranks of a Single Individual 
Percentiles S's 
88 Perc. 
Tests o 10 20 30 40 9 60 70 80- 90 100 Score Rank 


Picture С ion... 62 240 297 325 372 407 440 450 49 577 646 445 65 
Suite 19 190 173 158 152 141 133 126 11 109 80 1% 70 
eguin Form-Board.. 34 24 21 20 18 18 7 16 15 15 вию 


Median: Percentile Rank...» «««e e eve esee sese ttes etae ne rnnt entem vett ntt 65 


Table 42 gives an illustration of the value of percentile scaling 
when tests scored in different units are to be compared and combined. 
Percentile distributions for 9-year-olds are shown for three tests 
from the Pintner-Paterson Scale of Performance Tests.* The sub- 
Ject, a 9-year-old boy, made a score of 445 on the Completion Test 
which gave him a PR of 65 (midway between 60 and 70). On the 
Substitutiom Test, a score of 126 gave him a PR of 70; and on the 
Seguin Form Board a score of 17 gave him a PR of 60. The scores in 
the last two tests are in time units (seconds) so that the lowest 
scores numerically represent the highest performance. The median 
of this boy's PR's is 65, indicating that he stands somewhat, above 
the average of 9-year-olds. Since none of these PR’s is extremely 
high or low, they may be combined with little error. = 


D 


* Pintner, R., and Paterson, D. G., А Scale of Performance Tests (New York: | 


D. Appleton & Co., 1925), pp. 189, 197. 
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ІІ. The Scaling of Judgments 


1. Converting judgments into normal curve units (product scales) 


We have seen in the last section how test scores may be scaled on 
the principle that the c-value determined from the percentage pass- 
ing a given item is an acceptable index of difficulty. It often happens, 
however, that the ability or trait in which we are interested is of such 
a nature that achievement cannot be expressed by a test score. This 
necessitates the construction of what are called product scales. In 
such scales excellence of performance is evaluated by comparing an 
individual’s production with various “standard productions” the 
values of which have been determined beforehand by a consen- 
sus of expert judgment. Handwriting, compositions, and drawings 
are well-known examples of product scales. The excellence of a per- 
son’s penmanship, for example, can be determined by comparing а 
sample of his writing with various specimens of handwriting, the 
quality of which has been measured against some criterion. 

Product scales are constructed on the principle that “equally often 
noticed differences” in quality are equal. If composition A, for exam- 
ple, is rated better than composition B by 75% of a group of com- 
petent judges, and composition X is rated better than composition Ү 
by 75% of the same Judges, then the difference between A and B is 
taken to the be same as the difference between X and Y (because 
equally often observed). 

The assumption that “equally often noticed differences are equal” 
has been criticized * and is most doubtful when applied to the sealing 
of items at the extremes of the qualitative range, The variability of 
judgments upon extremely good or extremely poor specimens will 
ordinarily be less than the Tange of judgments made upon intermedi- 
ate specimens. In most product scales the accurate measurement of 
these extreme specimens is, perhaps, not so important as is the accu- 
rate scaling of those items which constitute the main body of the 
scale. For this reason, the assumption that equally often noticed 
differences are equal will give scales which are just as valuable 


practically as those resulting from the use of more refined techniques: 


* Thurstone. L. L., “Equally Often N. ti i p Educa- 
tional Psychology, 1827, 18, 290-95 oticed Differences,” Journal of 


Thurstone, L. L., “Psychophysical Analysis.” 7 ology: 
1927, 38, 968 389. y: nalysis,” American Journal of Psycho 
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Steps in constructing a product scale may be set down as follows: 


(1) 


(3) 


(4) 


Collect a large number of samples of the product to be scaled 
(e.g., handwriting, drawings, jokes, pictures). These specimens 
should range by gradual stages from very poor to excellent. 
Persuade a number of competent persons to act as judges of the 
comparative excellence of the specimens. Instruct these judges 
to compare every specimen with every other specimen, so that а 
consensus may be obtained on each. The order of merit method, 
the paired comparisons method, or some variation of these, 
should ordinarily be employed here, as these experimental tech- 
niques provide a systematic attack upon the problem of ranking 
samples for excellence.* 

Reduce the number of times each specimen is ranked above each 
other specimen to percentage terms, and express these percents 
as o-distances between each pair of specimens. To illustrate, if 
drawing А is judged better than drawing B by 65% of the group, 
A — B= .390;if B is judged better than C by 77%, B—C 
=.740. These o-differences are read from Table A and are found 
in the following way: If a sample is judged better than another 
by just 50%, there is no observable difference between the two 
and their o-difference is zero. But if A is judged better than B 
by 65%, the difference between A and B (in excess of chance) 
is 15%, which from Table A corresponds to a o-difference of .39. 
In exactly the same way the difference between В ала С (іп 
excess of chance) is 27%, which corresponds to а o-difference of 
74. Figure 55 shows graphically how percentage differences can 
be converted into o-differences. The distributions of judgments 
upon A, B, and C are assumed to be normal and are taken to be 
equal in range and variability. The mean value of A (its scale 
value) is .390 above the mean value of B, the mean value of 
which is, in turn, .740 above the mean value of C. 

Determine a difference for each pair of specimens, and express 
each item finally selected for the scale as so many o-units from 
the arbitrary zero. The procedure may be illustrated by two 
items, numbers eight and nine, taken from the Hillegas Com- 
position Scale. Hillegas had each of 202 judges arrange а 
number of English compositions in order of merit. Ап artificial 


* Woodworth, R. S. Experimental Psychology. (New York: Henry Holt 


& Co., 1938), pp. 37 
t Hillegas, Mi 


2-318. 
illegas, Milo B., A Scale for the Measurement о) Quality in English Com- 


position by Young People, Teachers College Record, 1912, 13, 4, 5-55. 
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с ВА 


FIG. 55 To illustrate o-scale differences between specimens A, B, and C. 


The distributions of judgments on the three specimens are taken 
to be normal, and equal in range and variability 
composition was selected as being of just zero merit, and as- 
signed the value of 0 on the scale. Of the 202 judges, 136 or 
87.33% ranked specimen 9 as better than specimen 8. From 
Table A, we find that a percentage difference of 17.33 (67.33 
— 50) indicates a PH difference of -65, and this value expresses 
the amount by which 9 is better than 8. The value of specimen 8 
had already been found to be 7.72PE * above the zero point on 
the scale. Hence, Specimen 9 is 7.72 -+ .65 or 8.37PE above the 
Zero composition. The values of the nine compositions on the 
Hillegas Scale as measured in PE units from the zero composi- 
tion are 1,83, 2.60, 3.69, 4.74, 5.85, 6.75, 7.72, 8.37, and 9.37. 


Note that the Steps on the scale are fairly regular and are about 
ІРЕ apart. 


2. Transforming qualitative data into numerical scores 


Tt is possible to express 
tative terms, if we can 
which we have sample 
Several fechniques ba. 

^ in this section, 


many kinds of qualitative data in quanti- 
assume that measures of the trait or ability 
d are normally distributed in the population. 
Sed upon the normal curve will be considered 


* The PE was the unit used by Hillegas, PẸ = 67450, р. 97, 
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(1) THE SCALING OF ANSWERS TO A QUESTIONNAIRE 


Answers to the queries or statements in most questionnaires admit 
of several possible replies, such as Yes, No, 7; or Most, Many, Some, 
Few, No; or there are four or five answers one of which is to be 
checked. Itis often desirable to ^weight" these different alternatives 
in accordance with the degree of divergence from the “typical 
answer” which they indicate. First we assume that the attitude or 
personality trait expressed in answering a given proposition is nor- 
mally distributed. From the percentage who accept each alternative 
answer to a question or statement, we may then find a o-equivalent, 
which will express the value or weight to be given that answer. 
Likert’s * Internationalism Scale furnishes an example of this scal- 
ing technique. This questionnaire contains 24 statements upon each 
of which the subject is requested to give an opinion. Approval or 


TABLE 43 Data for statement No. 16 of the Internationalism Scale 


i ; Strong); 
Answers Reone Approve Undecided Disapprove Disapprors 
rn checking 13 43 21 13 10 
equivalent 
o-values —1.63 - 43 43 99 1% 
Standard-scores 34 46 54 60 


disapproval of any statement is indicated by checking one of five 
possibilities “strongly approve,” “approve,” “undecided,” “Фвар- 
prove,” and “strongly disapprove.” The method of scaling as applied 
to statement No. 16 on the Internationalism Scale is shown in 


Table 43 above. This statement reads as follows: 
16. All men who have the opportunity should enlist in the Citi- 
zens’ Military Training Camps. у | 
Strongly approve Approve Undecided Disapprove 
Strongly disapprove 
The percentage selecting each of the possible answers is shown in 
the table. Below the percent entries are the o-equivalents assigned 
to each alternative on the assumption that opinion on the question 


іы -Ж 


is normally distributed—that few will wholeheartedly agree or dis- | ' 


agree, and many take intermediate views.. The c-values in Table 43 


* Likert, R., A Technique for the Measurement of Attitudes, Archives of Psy- 
chology, 1932, No. 140. - 
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have been obtained from Table H (p. 435) in the following way: 
Reading down the first column headed 0, we find that beginning at 
the upper extreme of the normal distribution, the highest 1096 has an 
average o-distance from the mean of 1.76. Said differently, the mean ; 
of the 10% of cases at the upper extreme of the normal curve is at a 
distance of 1.766 from the mean of the whole distribution. Hence, 
the answer “strongly disapprove” is given a o-equivalent of 1.76 


(see Fig. 56). 
A 


—1.68g —.43c||.430..99g 1,160 


-80 -2¢ -lis 0 lo 20 3c 


FIG. 56 To illustrate the scaling of the five possible answers to statement 
16 on Likert's Internationalism Scale 


To find the o-value for the answer “disapprove,” we select the 
column headed .10 and running down the column take the entry 
opposite 13; namely, .99. This means that when 10% of the distribu- 
tion reading from the upper extreme have been accounted for, the 
average distance from the mean of the next 13% is .99с. Reference 
to Figure 56 will make this clearer. Now from the column headed 
23 (13% + 10% “used up” or accounted for), we find entry .43 орро- 
site 21. This means that when the 23% at the upper end of the dis- 
tribution have been cut off, the mean o-distance from the general 
mean of the next 21% is .430, which becomes the weight of the pref- 
erence “undecided.” The weight of the fourth answer “approve” must 
be found by a slightly different process. Since a total of 44% from 
the upper end of the distribution have now been accounted for, 6% 
of the 43% who marked “approve” will Ше to the right of the mean 


E 


* 
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and 37% to the left of the mean, as shown in Figure 56. From the 
column headed 44 in Table H, we take .08 (entry opposite 6%) 
which is the average distance from the general mean of the 6% lying 
just above the mean. Then from the column headed 13 (50% — 3770) 
we take entry .51 (now —.51) opposite 37%, as the mean dis- 
tance from the general mean of the 37% just below the mean. The 
— .51 X .37 + .08 X .06 = 
43 
assigned to the preference “approve.” The 13% left, those marking 
“strongly approve,” occupy the 13% at the extreme (low end) of the 
curve. Returning to the column headed 0, we find that the mean dis- 
tance from the general mean of the 13% at the extreme of the dis- 
tribution is —1.630. 

In order to avoid negative values, each o-weight in Table 43 can 
be expressed as a o-distance from —3.00с (or —5.00с). If referred 
to —3.006, the weights become in order 1.37, 2.57, 3.43, 3.99, and 4.76. 
Dropping decimals, and taking the first two digits, we could also as- 
sign weights of 14, 26, 34, 40, and 48. Again each c-value in Table 43 
may be expressed as а standard score in a distribution the mean of 
which is 50 and the o 10. The category “strongly approve" is 
—16(—1.63 X 10) from the mean of 50, or at 34. Category “ap- 
prove” is —4(—.43 X 10) from 50 or at 46. The other three cate- 
gories have standard scores of 54, 60, and 68. 

When all 24 statements on the Internationalism Scale have been 
scaled as shown above, а person’s “score” (his attitude toward inter- 
nationalism in general) is found by adding up the weights assigned 
to the various preferences which he has selected. An individual 
whose opinions are extreme, е.5., who tends strongly to disapprove 
many statements, will receive a proportionally larger. total score 
when the choices are o-scaled than he would receive if the five pos- 
sibilities were assigned arbitrary weights of 1, 2, 3,4, and 5. It has 
been shown, however, that o-scaling yields results which, for the test 
as a whole, are little if any more reliable or more discriminatory than 
the results obtained when the five answers are scored simply 1, 2, 3, 
4, and 5. This virtual equality of scaling and rule-of-thumb method 
is a rather familiar finding in mental measurement. In the present 
instance, it probably arises from the fact that the greater differenti- 
ation which the o-scaling technique provides for single items is lost in 
the process of adding or averaging the score weights from many 
items. A real advantage of o-scaling is that the units of the scale 


.43, which is the weight 


algebraic sum 
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are equal and may be compared from item to item or from scale to 
scale. Also, o-scaling gives a more accurate picture of the extent to 
which extreme or biased opinions on a given question are divergent 
from the typical opinion than does the arbitrary weighting method. 


(2) THE SCALING OF RATINGS 


In many psychological problems individuals are rated or ranked 
for their possession of characteristics or attributes not readily meas- 
ured in terms of performance. Honesty, interest in one’s work, tact- 
fulness, originality, are illustrations of such traits. Suppose that two 
teachers A and B have rated a group of forty pupils for “social re- 
sponsibility” on a 5-point scale. A rating of 1 means that the trait 
is possessed in marked degree, a rating of 5 that it is almost if not 
entirely absent, and ratings of 2, 3, and 4 indicate intermediate 
degrees. Assume that the percentage of children assigned each rating 


is as follows: - 


^ ^" Social Responsibility 
Rating А B 
1 1096 20% 
2 15% 40% 
3 50% 20% 
4 20% 10% 
5 5% . 10% 


It is obvious that В rates more leniently than A, so that a rating 
of 1 by B may not represent the same degree of “social responsibil- 
ity” as a rating of 1 by A. Can we assign “weights” or numerical 
Scores so as to make the ratings of the two teachers comparable? 
The answer is “yes,” provided we can assume that the distribution 
of the trait “social responsibility” is normal, and that one teacher 18 
4s competent a judge as the other. From Table H, we may read 


o-equivalents to the percents given each rating by A and В 88 
follows: 


Rating A B 
1 176 140 
Ы 2 95 27 
3 00 - 53 
4 —107 —104 
5 —2.10 —176 


These o-values are read from Table Н in exactly the same way 88 
were the o-equivalents in the previous problem (p. 431). If we 
assume —3.00o as an arbitrary reference point, the o-values for the 
ratings of A and В all become positive: 


ж 


4 


ре тар 
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Rating A B 
1 4.76 440 
2 3.95 327 
3 » 3.00 247 
4 1.93 1.96 
5 90 124 


Dropping decimals, and taking only the first two digits, A’s and B’s 
ratings become: 


Rating A B 
1 48 44 
2 40 33 
3 30 25 
4 19 20 
5 9 12 


or, expressed as standard scores in a distribution with a mean of 50 
and ao of 10, 


Rating A B 
1 68 64 
2 60 58. 
3 50 45 
4 39 40 
5 29 32 


The ratings of A and B may be combined by adding or by averag- 
ing them. 

Table H will prove valuable in enabling one to transmute many 
kinds of qualitative data into quantitative terms or scores. Almost 
any attribute upon which relative judgments can be obtained may be 
assigned scores in а normal distribution in terms of the c of the 


judgments. 
(3) CHANGING ORDERS OF MERIT INTO NUMERICAL SCORES 


It is often desirable to transmute orders of merit into units of 
amount or “scores.” This may be done by means of tables, if we are 
justified in assuming normality for the trait. To illustrate, suppose 
that 15 salesmen have been ranked in order of merit for selling effi- 
ciency, the most efficient salesman being ranked 1, the least efficient 
being ranked 15. If we are justified in assuming that "selling effi- 
ciency" follows the normal probability curve in the general popula- 
tion we can, with the aid of Table 44 (p. 324), assign to each man a 
“selling score" on a scale of 10 or of 100 points. Such a score will 
define ability as a salesman better than will a rank of 2, 5, or 14. 
The problem may be stated specifically as follows: 

EJ 
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Example (1) Given 15 salesmen, ranked in order of merit by 
their sales manager, (a) transmute these rankings into scores on a 
scale of 10 points ; (b) а scale of 100 points, 


First, by means of the formula 


Percent position — METS (75) 


(formula for converting ranks into percents of the normal curve) 


in which R is the rank of the individual in the series * and N is the- 


. number of individuals ranked, determine the "percent position" of 


No. 1, has a percent position of 20001.65) ог 3.33, and his score 


15 
from Table 44 is 9 or 85 (finer interpolation unnecessary). Sales- 


— 6 
man B, who ranks No. 2, has a Percent position of 102 8] 


15 
or 10, and his score, accordingly, is 8 or 75. Тһе scores of the other 
al 


esmen, found in exactly the Same way, are given in Table 45. 


TABLE 44 The transmutation of orders of merit into units of amount or 


“scores” + 
Example: If N = 25, and R = 3, Percent Position is E 2) or 10 (for- 
mula (75) and from the table, the equivalent rank is 75, on a scale of 100 points. 
Percent Score Percent Score Percent Score 
09 99 22.32 65 83.31 31 
20 98 23.88 64. - 84.56 30 
32 97 2548 63 85.75 29 
45 96 27.15 62 86.89 28 
61 95 28.86 61 87.96 27 
78 94 30.61 60 88.97 26 
97 93 3242 59 89.94 25 
118 92 34.25 58 90.83 24 
142 91 36.15 57 91.67 23 
1.68 90 38.06 56 92.45 22 
1.96 89 40.01 55 93.19 21 
2.28 88 41.97 54 93.86 20 
2.63 87 43.97 53 94.49 19 
3.01 86 45.97 52 95:08 18 
843 85 47.98 51 95.62 17 
3.89 84 50.00 50 96.11 16 
4.38 83 52.02 49 96.57 15 


ЖА rank is ап interval on a scale; 5 is subtracted fi 
midpoint, best represents an interval, Eg, R 
4-5, and 45 (or 5 — 5) is the midpoint, 


į From Hull. C. L., “The Computation of Pearson's r from Ranked Data," 
Journal of Applied Psychology, 1922, 6, Pp. 385-390. 


rom each R because its 
=5 is the 5th interval, namely 
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Percent Score Percent Score Percent Score 
4.92 82 54.03 48 96.99 14 
5.51 81 56.03 47 97.37 13 
6.14 80 58.03 46 97.72 12 
6.81 79 59.99 45 98.04 11 
7.55 78 61.94 44 98.32 10 
8.33 77 63.85 43 98.58 9 
9.17 76 65.75 42 98.82 8 

10.06 75 67.48 41 99.03 7 
11.03 74 69.39 40 99.22 6 
12.04 73 7144 39 99.39 & 
13.11 72 7285 38 99.55 4 
14.25 71 74.52 37 99.68 3 
1544 70 76.12 36 99.80 2 
16.69 69 77.68 35 99.91 1 
18.01 68 79.17 34 100.00 0 
19.39 67 80.61 33 

20.93 66 81.99 32 


It has been frequently pointed out that the assumption of normal- 
ity in a trait implies that differences at extremes of the trait are rela- 
tively much greater than differences around the mean. This is clearly 
Brought out in Table 45; for, while all differences in the order of 
merit series equal 1, the differences between the transmuted scores 
vary considerably. The largest differences are found at the ends of 
the series, the smallest in the middle. For example, the difference in 
score between A and B or between N and O (on a scale of 100) is 
three times the difference between G and H. Clearly, it is three 
times as hard for a salesman to improve sufficiently to move from 
second to first place as it is to move from eighth to seventh place. 


TABLE 45 The order of merit ranks of 15 salesmen converted into nor- 
mal curve "scores" a . : 


Percent Chay 


Order of Merit Position 
Salesmen Ranks (Table 44) Scale (10) Scale (100) РЕ» 


Int 1 333 9 85 97 
B 2 10.00 8 75 90 
С 3 16.67 7 69 83 
D 4 23.33 6 64 77 
E 5 30.00 6 60 70 
F 6 36.67 6 57 “63 
с 7 43.33 5 53 57 
H 8 50.00 5 50 50 
I 9 56.67 5 47 43 
J 10 63.33 4 43 37 
K 11 70.00 4 40 30 
Т, 19 76.67 4 36 93 
M 13 83.33 3 31 17 
Қо” geld 90.00 2 95 10 
[o 15 96.67 1 15 3 


WOW 


326 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


The percentile ranks (PR’s) of our 15 salesmen in example (1) 
have been entered in Table 45 for comparison with the normal curve 
scores. These PE's were caleulated by means of the following for- 
mula, which converts orders of merit into percentile ranks. 


_ (100R — 50) 


PR — 100 (76) 


(percentile ranks for individuals arranged in order of merit) 


The Ё in the formula is the rank position of the individual, count- 
ing No. 1 as the highest rank. Thus, the salesman who ranks No. 1 


in 15 has a PR of 100 — oo = 96.66 or 97; the salesman 


who ranks 5th has a PR of 100 — (00 X 5 — 50) _ 70. Note that 


the steps between adjacent PR’s are all equal. Orders of merit as 
well as PR’s assume the distribution of ability to be rectangular so 
that equal slices of area correspond directly to equal distances along 
the baseline. 

If there are 100 subjects in our group, each occupies one division 
of the percentile scale. Hence the rank of the poorest subject is .5 
(midpoint of the interval 0-1) and the rank of the best subject is 
99.5 (midpoint of interval 99-100). The person who ranks 50th in 


the group has a PR of 100 — (100 X 50 — 50) or 50.5, midpoint of 


interval 50-51. Since a subject’s PR is always the midpoint of an 
interval on a scale which runs from 0 to 100, it follows that no one 
can have a PR of 0 or 100. These two points constitute the bounda- 
ries or limits of the percentile scale. 

Another use to which Table 44 may be put is in the combination of 
incomplete order of merit rankings. To illustrate: 


Example (2) Six persons, A, B, C, D, E, and F, are to be ranked 
for honesty by three judges. Judge 1 knows all six well enough to 
tank them; Judge 2 knows only three well enough to rank them; 
and Judge 3 knows four well enough to rank them. Can we obtain 
a fair composite order of merit ranking for all six persons by com- 
bining these three sets of rankings, two of which are incomplete? 


We may tabulate our data as follows: 


Persons 
АИ s А В с D Е Е 
udge Гв ranking 1 2 3 4 
Judge 2’s ranking 2i 1 5 5 
Judge 3:5 ranking 2 1 3 4 
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It seems fair that A should get more credit for ranking first in a list 
of six than D for ranking first in a list of three, or C for ranking first 
in a list of four. In the order of merit ratings, all three individuals are 
given the same rank. But when we assign scores to each person, in 
accordance with his position in the list, by means of formula 75 and 
Table 25, A gets 77 for his first place, D gets 69 for his, and C gets 73 
for his. See table below: 


Persons 

A B с D E Е 

Judge 1% ranking 1 2 3 4 5 6 

всоге 77 63 54 46 87 23 

: Judge 25 ranking 2 1 3 

score 50 л 69 4 Е 
Judge 3’s ranking 2 

te b en TEN 44 27 

S f 133 113 127 115 81 81 

Men ae 67 57 64 58 41 27 

Order of Merit 1 4 2 3 5 6 


АП of the ratings have been transmuted as shown in ‘example (1) 
above. Separate scores may be combined and averaged to give the 
final order of merit shown in the table. 

By means of formula (75) and Table 44 it is possible to convert 
any set of ranks into "scores," if we may assume a normal distribu- 
tion in the trait for which the ranking is made. The method is useful 
in the case of those attributes which are not easily measured by ordi- 
nary methods, but for which individuals may be arranged in order 
of merit, as, for example, athletic ability, personality, beauty, and 
the like. It is also valuable in correlation problems when the only 
available criterion * of a given ability or aptitude is a set of ranks. 
Transmuted scores may be combined or averaged like other test 
scores. 

A word of explanation may be added with regard to Table 44. This 
table represents a normal frequency distribution which has been cut 
off at -+2.50c. The baseline of the curve is 50, divided into 100 parts, 
each .05с long. The first .056 from the upper limit of the curve takes 
in .09 of 1% of the distribution and is scored 99 on a scale of 100. 
The next .05с (.10c from the upper end of the curve) takes in .20 
of 1% of the entire distribution and is scored 98. In each case, the 
percent position gives the fractional part of the normal distribution 
which lies to the right of (above) the given “score” on baseline. 


PROBLEMS 


1. Five problems are passed by 15%, 34%, 50%, 62%, and 80%, тезрес- 
tively, of a large unselected group. If the zero point of ability in this 
* Кос definition of a criterion, see Chapter 13. p. 345. 
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test is taken to be at —30, what is the g-value of each problem as 
easured from this point? 

- (a) The fifth grade norms for a reading examination are Mean — 60 

and.SD = 10; for an arithmetic examination, Mean — 26 and 

SD = 4. Tom scores 55 on the reading and 24 on the arithmetic 

test. Compare his g-scores. In which test is he better? 

Compare his standard scores in a distribution with M of 100 and 

SD of 20. 

3. (a) Locate the deciles in a normal distribution in the following way. 
Beginning at —3c, count off successive 1095s of area up to +80. -3 
Tabulate the g-values of the points which mark off the limits of Ф 
each division. For example, the limits of the first 10% from —30 ў 
are —3.00g and —1.28g (see Table A). Label these points in order 
from —3g as .10, 20, ete. Now compare the distances in terms of 
с between successive ten percent points. Explain why these dis- 
tances are unequal. 

(b) Divide the baseline of the normal probability curve (take as 60) 
into ten equal parts, and erect a perpendicular at each point of 
division. Compute the percentage of total area comprised by each 
division. Are these percents of area equal? If not, explain why. 
Compare these percents with those found in (a). 


4. Fifty workers are rated on a 7-point scale for efficiency on the job. т & 
/ The following data represent the distributions of ratings (in which 1 is | 
best and 7 worst) for two judges. Judge X is obviously very lenient and 
Judge Z is very strict. То make these two sets of judgments compara- 
ble, use the following three procedures: 


“(а) Percentile scaling: divide each distribution into 5 parts by finding 
Successive 2095s of М. Let A = first 20%, B the next 20%, and so 
on to E, the fifth 20%. 
(b) Standard scores: Find the M and SD for each distribution and con- 
vert each rating into a common distribution with M of 50 and SD 
of 10. 
(c) T-scores: Find T-scores corresponding to ratings of 1,2, 8 . . . 7. 


Now compare Judge X's rating of 3 with Judge Z's rating of 3 by 
the three methods. 


(b 


= 


Judge X Rating f Judge Z Rating f 
1 5 1 2 
2 10 2 4 
3 20 3 4. 
4 5 4 5 
5 4 5 20 
6 4 6 10 
7 2 7 5 


М-50 N=50 


8. 
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In a large group of competent judges, 77% rank composition A as better 

than composition B; 65% rank B as better than C. If C is known to 

have а o-value of 3.50 as measured from the “zero composition,” i.e., 

the composition of just zero merit, what are the o-values of B and A as 

measured from this zero. point? 

Twenty-five men on a football squad are ranked by the coach in order 

of merit from 1 to 25 for all-around playing ability. On the assumption 

that general playing ability is normally distributed, transmute these 
ranks into “scores” on a scale of 100 points. Compare these scores with 
the PR’s of the ranks. 

(a) In accordance with their scores upon a learning test, 20 children 
are ranked in order of merit. Calculate the percentile rank of each 
child. 

(b) If 60 children are ranked in order of merit, what is the percentile 
rank of the first, tenth, fortieth, and sixtieth? 

On an Occupational Interest Blank, each occupation is followed by five 

symbols, L! L ? D D!, which denote different degrees “liking” and 

“disliking.” The answers to one item are distributed as follows: 

L! L ? D D! 
8% 20% 38% 24% 10% 


(a) By means of Table H convert these percents into g-units. 
(b) Express each o-value as a distance from “zero,” taken at —3o, and 


multiply by 10 throughout. $ 244 
(с) Express each o-value as a standard score in a distribution of mean 


50, с 10. 
Letter grades are assigned three classes by their teachers in English, his- 


tory, and mathematics, as follows: 


Mark English History Mathematics 
A 25 11 6 
B 21 24 15 
с 82 20 25 
D 6 8 20 
Е 1 2 8 
85 65 74 


(a) Express each distribution of grades in percents, and by means of 
Table H transform these percents into o-values. 

(b) Change these o-values into 2-digit numbers and into standard 
scores following the method on page 305. 

(с) Find average grades [from (0)1 for the following students: 


Student English History Mathematics 
S.H. А В с 
ЕМ. с B А 


D.B. B D F 


330 * STATISTICS IN PSYCHOLOGY AND EDUCATION 
10. Calculate T-scores in the following problem: 
2 Percent below given score 
Plus One-half 
Reaching T-score 


99.5 76 
98.0 71 


Scores 
91 
90 

- 89 


f 
2 
4 
6 

20 
24 
86 28 
40 
36 

24 

12 
4 


2% 


11. Calculate T-scores for the midpoints of the class-intervals in the follow- 
ing distribution: 


Percent below given interval 
Plus One-half reaching 


Scores f Midpoint T-score 
40-44 8 | 946 66 
85-39 12 
80-34 20 
25-29 15 
20-24 15 
15-19 5 

75 

ANSWERS 


1. In order: 4.04; 3.41; 3.00; 2.69; 2.16. 


2. (a) In neither, same score in both 
(b) Reading 90, Arithmetic 90 


8. (а) 00 10 20 30 40 50 60 70 80 590 1.00 
—3.00 —1.28 — 84 — 59 —95 0 25 52 84 1.28 3.00 
Diffs: 172 44 32 27 95 25 27 32 44 172 


(b) Percents of area in order: 68; 2.77; 7.92; 1592; 22.57; 22.57; 
15.02; 7.92; 2.77;-.68. 


4. (a) Cvs. А; (Б) 52 vs. 61; (c) 50 vs. 60 
5. B, 3.89; A, 4.63 


(5-6. 
8. 
; 9. 
(| 
| 
К. 
| Ж 
|] 
11 
à 
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Rank: T :2 8: 4 5 6: о 08001) 612813 
Score: 89 80 75 71 68 65 63 60 58 56 54 52 50 
PR’s: 98 94 90 S86 82 78 74 70 66 62 58 54 50 
Rank: 14 15 16 17 18 19 20 21 22 23 24 25 
Score: 48 46 44 42 40 37 35 32 29 25 20 11 
PR’s: 46 42 38 34 30 26 22 18 14 10 6 2 
L! L ? р D! 
(a) —1.86 —.94 —.08 80 = 1276 
(b п 21 29 38 48 
(c) 31 41 49 58 68 
F D с В А 
(а) English —2.70 --174 —.65 22 1.18 
History —2.28 —1.38 —.58 39 149 
Math. —171 — 71 18 94 1.86 
(5) English History Mathematics 
—8.00g Stan. Score --3.006 Stan. Score — —3.00g Stan. Score 
A 42 62 45 65 69 
B 32 52 84 54 89 59 
С 94 44 25 45 31 51 
р 18 38 16 86 23 43 
F 8 23 7 27 13 33 


(c) S. H., 36 or 56; F. M., 36 or 56; D. В., 20 or 40 


- T-scores: 


76, 71, 67, 62, 58, 54, 49, 44, 39, 34, 27 


- T-scores 


66, 59, 53, 47, 40,32 


THE RELIABILITY AND VALIDITY 
OF TEST SCORES 


Ly 


+ 


l. The Reliability of Test Scores 


7 The reliability of a test, as of any measuring instrument, depends 
upon the consistency with which it gauges the abilities of those to 
whom it has been applied. When a test is reliable, scores made by the 
members of a group—upon retest with the same test or with alter- 
nate forms of the same test—will differ very little or not at all from 
their original values. A reliable test, therefore, is relatively free of 
chance errors of measurement, and scores earned on it are stable and 
trustworthy. If a subject scores 84, say, on a reliable test, we feel 
confident that this score is close to his true achievement. Scores 
made on an unreliable test, on the other hand, are subject to large 
errors of measurement and are neither stable nor trustworthy. When 
a test is unreliable, subsequent testings will reveal many discrepan- 


cies between Scores achieved by the same persons on different 
occasions. uz 


І. Methods of determining test reliability 


There are three procedures in common use for determining the reli- 
ability (sometimes called the self-correlation) of a test. These are 
(1) the test-retest (repetition) method; (2) the alternate or parallel 
forms method; and (3) the split-half method. In addition to these 
three, a fourth method—the method of "rational equivalence”—is 
also being widely used. All of these procedures furnish “estimates” 
of the reliability of test scores; sometimes one method and sometimes 
another will give the best estimate. 
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ас 
' © (1) mEST-RETEST (REPETITION) METHOD 

Repetition of a test is the simplest method of determining reliabil- 
ity: the test is given and then repeated on the same group and the 
correlation is caleulated between the first and second sets of scores. 
While the test-retest method is sometimes the only feasible proce- 
dure, it is open to various objections. If the test is repeated immedi- 
ately, many subjects will recall their first answers and spend their 
time on new material, thus inereasing their scores. Besides the 
memory effect, practice and the confidence induced by familiarity 
эл with the material will almost certainly affect scores when one takes 
М a test for the second time. Transfer effects are likely to be different 

from person to person. If the net effect of transfer is to make for 
closer agreement between scores achieved on the first and second 
giving of a test than would otherwise be the case, the reliability co- 
efficient will be too high. When a sufficient time interval has elapsed 
between the first and second administrations of the test to offset (in 
part, at least) memory, practice, and other effects, the reliability 
coefficient will be a closer estimate of the actual consistency of test 
‘scores. If the interval between tests is long, however (say, six 
" months or so), and the subjects are children, growth or maturity 
changes will affect the retest. А 
The test-retest method will estimate less accurately the reliability 
of tests which contain novel features and which are highly suscepti- 
ble to practice than it will the reliability of tests involving routine 
operations little affected by practice. Because of the difficulty in 
controlling the conditions which influence scores on different admin- 
istrations of a test, the test-retest method is used less generally than 


are the other two methods. 


(2) ALTERNATE OR PARALLEL FORMS METHOD 
When alternate or parallel forms of a test have been constructed, 
the correlation between Form A, say, and Form B is taken as a 
measure of the self-correlation of the test. This method is employed 
by the authors of most standard psychological and educational tests, 

for which alternate forms are usually available. 
The alternate forms method is satisfactory if sufficient time has 
intervened between the administration of the two forms to weaken 
^ or eliminate memory and practice effects. When Form B of a test 
А follows Form A very closely, scores on ће second test will usu- 
ally be increased through practice and familiarity. When such in- 
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creases are approximately constant (say, three to five points for 
each score) the reliability coefficient of the test will not be affected, 
since paired A and B scores maintain their same relative positions 
in the two distributions. When the mean increase due to practice 
has been determined, a constant amount can be subtracted from 
Form B scores to make them comparable to Form A scores.* In 
drawing up alternate forms of a test, one should be careful to match 
test materials for content, difficulty, and form; but one must be 
careful not to make the test forms too much alike. If alternate forms 
are practically identical, the reliability coefficient of the test will be 
too high; while if parallel forms are not sufficiently “duplicate” the 
reliability coefficient will be too low. 


(3) THE sPLIT-HALF METHOD 


In the split-half method the test is broken into two equivalent 
parts and the correlation of these half tests is computed. From the 
half-test reliability, the self-correlation of the whole test is esti- 
mated by the Spearman-Brown formula described on page 339. 

The split-half method is employed when it is not feasible to con- 
struct an alternate form of the test nor wise to repeat the test. This 
situation occurs with many performance tests, as well as with tests 
and questionnaires dealing with personality traits, attitudes, and 
the like. A performance test (e.g., picture completion, puzzle solving, 
form board) is often a very different task when repeated, as the child 
is familiar with procedure and content. Likewise, many personality 
tests cannot be given in alternate form nor repeated because of radi- 
cal changes in the subject’s attitude and interests when taking such 
tests for the second time. 

The split-half method is often regarded as the best of the methods 
for determining test reliability. Perhaps its main advantage is that 
all of the data for determining test reliability are obtained upon one 
occasion; hence variations introduced by differences between the two 
testing situations are eliminated. A disadvantage of the split-half 
method is that chance errors may affect the scores on both halves of 
the test in the same way, thus tending to make the reliability coeffi- 
cient too high. The longer the test, the less the probability that 
the effects of temporary and variable disturbances will be cumula- 

_*In the Otis Self-Administering Test of Mental Abilities, Higher Examina- 
tion, for instance, the author suggests that when Form B, which is slightly more 
difficult than Form A, is given first, 4 points be added to each score, This is to 


make scores equivalent to the norms for Form B when this test is given after 


Form A, as it usually is, See Manual о Directi i kers: 
World Book Co., 1928), p. 2. I Directions, Otis S-A Test (Yonker 


x] 


| 
7 
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tive and in one direction, and the more accurate the estimate of s 
reliability. 

Objection has been raised to the split-half method on the ground 
that a test can be divided into two parts in a variety of ways so that 
the reliability coefficient is not a unique value. This criticism is 
strictly true only when items are of equal difficulty. When items are 
placed in order of merit from least to most difficult, the split into odds 
and evens gives a unique determination of the reliability coefficient. 


(4) THE METHOD OF “RATIONAL EQUIVALENCE” 


! The method of rational equivalence * represents an attempt to get 
an estimate of the reliability of a test, free from the objections raised 
a@hjnst the methods outlined above. Two forms of a test are defined 
as “equivalent” when corresponding items a, A, b, В, etc., are inter- 
changeable; and when the inter-item correlations are the same for 
both forms. The method of rational equivalence stresses the inter- 
correlations of the items in the test and the correlations of the items 
with the test as a whole. Four formulas for determining test reliabil- | 
ity have been derived, of which the one given below is perhaps the 

EN most useful: 

í n o*, — Хра 

Tir— (n—1) x har G р (77) 

(reliability coefficient of a test in terms of the difficulty 

and the intercorrelations of test items) 


in which: 
тт = reliability coefficient of the whole test; 
^; = number of items in the test; 


с; = the SD of the test scores; | 
р = the proportion of the group answering a test item correctly; 
q = (1— р) = the proportion of the group answering a test item 


incorrectly. 
To apply formula (77) the following steps are necessary: VA 


Step | 


Compute the SD of the test scores for the whole group, namely, o;. 


PAN * Kuder, С. F., and Richardson, М. W., "The Theory of Estimation of Test 


7% Reliability," Psychometrika, 1937, 2, 151-160. 1 
ЕМ, M. W. and Kuder, С. Е., “The Calculation of Test Reliability ` 


Coefficients Based upon the Method of Rational Equivalence,” Journal of Edu- 
cational Psychology, 1939, 30, 681-687. Y ыс 
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Step 2 . 


Find the proportions passing each item (p) and the proportions 
failing ea@h item (q): =. 


` Step 3 


Multiply p and q for each item and sum for all items. This gives 
Хра. 


Step 4 


Substitute the calculated values in formula (77). 

To illustrate, suppose that a test of sixty items has been adminis- 
tered to a group of eighty-five subjects; o, = 8.50 and Ура = 12.43. 
Applying (77) we have 


60 `, 72.25 — 12.43 


mrs- 
59 472.25 
which is the realibility coefficient of the test. | А 
A simple approximation to formula (77) has been devised.* This 
formula is useful to teachers and others who want to determine 


quickly the reliability of short objective classroom examinations ог 
other tests. It reads: 


842 


то, — M (n — M) 
02, (n — 1) 


[approximation to formula (77)] 


n> 


(78) 


in which 

fır = reliability of the whole test; 
n = number of items in the test; 
с, = SD of the test Scores; 

М - the mean of the test Scores. 


Formula (78) is a labor saver since only the mean, SD and number 
of items in the test need be known in order to get an estimate of reli- 
ability. The correlation need not be computed between alternate 
forms or between halves of the test. Suppose that an objective test 
of forty multiple-choice items has been administered to а small class 


* Froelich, G. J., “A Simple Index of Test Reliability,” Journal of Educational 
Psychology, 1941, 32, 381-385. 


A 
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ÜU Re of students. An item answered correctly is scored 1, an item an- 


swered incorrectly is scored 0. The mean test score is 25.70 and 
с, = 6.00. What is the reliability coefficient of the test? Substituting 
in (78), we have Sul "^ ж: аж 
40 X 36.00 — 25.70 (40 — 25.70) 
a 36.00 X 39 
= 76 

The assumption is made in formula (78) that all test items have 
the same degree of difficulty, i.e., that the same proportion of subjects 
(but not necessarily the same persons) pass each item. In a power 
test items are never of equal difficulty. Formula (78) will give a 
satisfactory approximation to the test’s reliability, however, even 
when the test items cover a wide range of difficulty. Formula (78) 
always underestimates to a slight degree the reliability of a test as 
found by the split-half technique and the Spearman-Brown for- 
mula, and the more widely items vary in difficulty the greater the 
underestimation. This formula provides a minimum estimate of reli- 
ability—we may feel sure that tha test is at least as reliable as we 
have found it to be by (78). 

Formulas (77) and (78) are not strictly comparable to the three 
methods for determining the reliability of test seores given above. 
Ina sense, these formulas provide an estimate of the internal con- 
sistency of the test rather than an estimate of the dependability of 
test scores. Тһе method of rational equivalence is superior to the 
split-half technique in certain theoretical aspects, but differences in 
reliability as found by the two methods are never very large (of the 
order .02, etc.). Formula (78) is often to be preferred to the split- 
half method because of the time and caleulation it saves rather than 


for other reasons. 


2. Factors influencing the reliability of test scores: chance and constant 


errors 


Many factors affect the reliability of a test besides fluctuations in 
interest and attention, shifts in emotional attitude, and the differen- 
tial effects of memory and practice. To these “psychological” factors 
must be added environmental disturbances such as distractions, 
Noises, interruptions, errors in scoring, and the like. All of these vari- 
able influences (environmental and psychological) are subsumed 
under the head “chance errors.” Errors, to be truly “chance,” must 
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influence a score in such a way as to cause it to vary above—as often 
as below—its “true” value. The reliability coefficient is a quantita- 
tive estimate of the importance of chance or variable influences upon 
test scores. 


Constant errors, as distinguished from chance errors, work in only 


one direction. Constant errors may raise or lower all of the scores 
on a retest or on the alternate forms of the test, but will not affect the 
reliability coefficient. If every person taking Form B of a test is 
scored 5 points too high, for example, the self-correlation of the test 
will not be affected (1.е., the correlation between Forms A and B) 
but all of the scores on the second form will be in error by 5 
points. 

How high should the self-correlation of a test be in order for the 
reliability of the test to be considered satisfactory? 'This is an impor- 
tant question, and its answer depends upon the nature of the test, 
the size and variability of the group tested, and the purpose for 
which the test was given. To distinguish reliably between the means 
of two relatively small groups of narrow range of ability (for exam- 
ple, a fifth grade and a sixth grade) а reliability coefficient need be 
no higher than .50 or .60. If the test is to be used to differentiate 
among the individuals in the group, however, its reliability should be 
-90 or more. Most of the authors of intelligence tests and educational 
achievement examinations report correlations of .90 or more between 
alternate forms of their tests. Since the self-correlation of a test is 
directly affected by the variability within the group, in reporting 8 


test’s reliability coefficient the standard deviation of the group should 
always be given. 


3. The effect upon reliability of lengthening or repeating a test 


(1) THE RELIABILITY COEFFICIENT FROM MANY APPLICATIONS OR REP- 
ETITIONS OF A GIVEN TEST 


The mean of five determinations of height will, in general, be more 
reliable than a single determination (p. 183), and the mean of ten 
determinations will (in general) be more reliable than the mean of 
five. On the same principle, increasing the length of the test, oF 
averaging the results obtained from several applications of the test, 
or from alternate forms, will tend to increase reliability. If the self- 
correlation of a test is not satisfactory what will be the effect of 
doubling or tripling the tests length? To answer this question ex- 
perimentally would require considerable time and labor. Fortu- 


| 


“а 
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nately, a good measure of the effect of lengthening or repeating a test 


may be obtained from the Spearman-Brown “prophecy formula”: 


nra 
Таз TD Se Ea 79 
тот A bs 
(Spearman-Brown formula for estimating the correlation 
between n forms of a test, and n other similar forms) 
in which 
Тап = the correlation between n forms of a test and n alternate forms 
(or the mean of n forms against the mean of n other forms) ; 
Ti; = the reliability coefficient. 
Тһе subseripts (“11”) show that the correlation is between two forms 
of the same test. 

To illustrate the use of formula (79) suppose that in а group of 
100 adults the self-correlation of a test is .70. What will be the effect 
upon test reliability of tripling the length of the test? Substituting 
түү = .70 and n = 3 in formula (79) and solving for Tan, we have 


cbe ERU 
Tan mmu a 88 
1+2xX.70 240 

Tripling the test’s length, therefore, increases its reliability coeffi- 
cient from .70 to .88. Instead of tripling the length of the test we 
could give three parallel forms of the test and average the three 
Scores made by each person. Тһе reliability of these mean scores 
(each based upon three measures) will be the same, as far as purely 
statistical factors are concerned, as the reliability got by tripling the 
length of the test. 

The prophecy formula may also be used to find how many times a 
test should be repeated in order for test scores to reach a given stand- 
ard of reliability. Suppose that the self-correlation of a test is .80. 
How much will the test have to be lengthened or how many times 
repeated, in order to insure а reliability coefficient of .95? Substitut- 


ing түү = .80 and ran = -95 in the formula, and solving for п, we have 
(Үдеу LEONE M RESI 
UU. 12-80» —.80  .20-+ .80n 


and 
n = 4.75 or 5 in whole numbers 


The test must be five times its present length, therefore, or five alter- 
nate forms must be given and averaged, before the self-correlation 


of the test will reach .95. 
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Predictions of test reliability by the Spearman-Brown formula are 
valid only when the items or questions added to the test cover the 
same ground, are of equal range of difficulty, and are comparable in 
other respects to the items of the original test. When these conditions 
are satisfied, there would appear to be no reason, as far as the math- 
ematical process is concerned, why we could not boost the self-corre- 
lation of a test to any desired figure, simply by continuing to inerease 
its length or by continuing to repeat it. But it is highly improbable 
that the reliability coefficient of a test could be so increased indefi- 
nitely. In the first place, it is impracticable if not impossible to 
increase a test/s length, say, ten or fifteen times. Furthermore, be- 
yond a certain point, boredom, fatigue, loss of incentive, and the like 
inevitably affect our results and lead to “diminishing returns." When 
the material added to the test is strictly comparable to the original 
test items, and when motivation remains substantially constant, the 
experimental evidence * indicates that a test may be increased to six 
or seven times its original length, and the Spearman-Brown formula 
will still give a close estimate of empirically determined results. But 
after the first four or five lengthenings the prophecy formula may 
“over-predict”—give higher estimated reliabilities than those ob- 
tained by actual calculation. This is not an especially serious draw- 
back, however, as a test which needs so much lengthening in order 
to yield reliable results should be radically changed in form or con- 
tent, or better still, perhaps, discarded in favor of another test. 

The Spearman-Brown formula may be applied to ratings, judg- 
ments, and other estimates as well as to test items. When measuring 
the reliability of a personality rating scale, for instance, by correlat- 
ing the ratings made by two equally competent judges, we may em- 
ploy the prophecy formula to estimate the increased reliability which 
might be expected if there were four, six or more judges. 


(2) THE RELIABILITY COEFFICIENT FROM ONE APPLICATION OF A TEST 


When a test has no alternate form and cannot well be repeated, we 
may calculate the reliability of half of the test and then proceed to 


ж Holzinger, К. J., and Clayton, В,“ inane 3 ication 
of Spearman’s Prophecy Formula” ые a а Беса 


16, 289-299, Journal of Educational Psychology; 1925, 


Ruch, G. М., Ackerson, Luton, and Jackson, J. D., “An Empirical Study of the 
Spearman-Brown Formula as Applied to Educational Test Material,” Journ? 
of Educational Psychology, 1926, 17, 309-313. f 

1 Remmers, Н. H., Shock, N. W., and Kelly, E. L., “An Empirical Study, 0 
the Validity of the Spearman-Brown Formula as Applied to the Purdue Rating 
Scale,” Journal of Educational Psychology, 1927, 18, 187-195. 
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Weestimate the reliability of the whole test by the Spearman-Brown 


formula. This method is called the “split-half technique" (p. 334). 
Тһе procedure is to make up two sets of scores by combining, say, 
alternate exercises or items in the test. The first set of scores repre- 
sents, for example, performance on the odd-numbered items, 1, 3, 
5, 7, etc.; and the second set of scores performance on the even-num- 
bered items, 2, 4, 6, 8, etc. Other ways of making the two halves of 
the test as comparable as possible in content, difficulty, and suscepti- 
bility to practice may be employed, but the method described is the 
one most commonly used. From the self-correlation of the half test, 
the reliability coefficient of the whole test may be estimated from the 


formula " 


ana . (80) 


(Spearman-Brown formula for estimating reliability 
from two comparable halves of a test) 
in which 
Tır = the reliability coefficient of the whole test; 
= the reliability coefficient of one-half of the test, found experi- 


mentally. \ 
When the reliability coefficient of one-half of a test (r1 ші ) is .60 it fol- 


21 


lows from formula (80) that the reliability of the whole test (rir) 
is .75. 


Ti 


MAI 
аг 


4. Тһе index of reliability 


An individual’s “true score” on a test (p. 185) is defined as the 
mean of a very large number of determinations made of the given 
person on the same test or parallel forms of the test administered 
under approximately identical conditions. The correlation between 
а series of obtained scores and their corresponding theoretically 
“true” scores may be found by the formula 


Tie = Mar (81) 


(correlation between obtained scores on a given test and 


true scores in the function measured by the test) 


in which 
тү = the reliability coefficient of the given test; 
Ti» = the correlation between obtained and true scores. 
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The symbol “со” (infinity) designates “true scores,” that is, scores 
obtained from an "infinite" number of administrations of the test 
to the same group. 

The coefficient 71. is called the index of reliability; it measures the 
trustworthiness of test scores by showing how well obtained scores 
agree with their theoretieally true counterparts. The index of reli- 
ability gives the maximum correlation which the given test is capa- 
ble of yielding. This follows from the fact that “the highest possible 
correlation which can be obtained (except as chance might occasion- 
ally lead to higher spurious correlation) between a test and a second 
measure is with that which truly represents what the test actually 
measures, that is, the correlation between the test and the true scores 
of individuals in just such tests." * 

To illustrate the applieation of the index of reliability, suppose 
that for a given test the self-correlation is .64. Then Tio — №64 ог 
:80; and .80 is the highest correlation of which this test is capable, 
since it represents the relationship between obtained test scores and 
true test scores in the same function. If the self-correlation of a test 
is only .25, so that ri = \/.25 or .50, it is obviously a waste of time 
to continue using this test without lengthening or otherwise improv- 
ing it. A test whose index of reliability is only .50 is an extremely 
poor estimate of the function which it is trying to measure. 


5. The standard error of an obtained score 


The effects of variable or chance errors in producing divergencies 
of obtained scores from their true counterparts may be estimated 
by the formula 


бы = ом — т (82) 
(standard error of an obtained score ) 
in which 
бі = the standard error of an obtained score (sometimes called the 
“standard error of measurement") ; 


бі = the standard deviation of the test scores; 
Ті = the reliability coefficient of the test, 


The subscript ^," indicates this standard deviation to be a measure 
of the error made in taking an obtained score Ge., 1) as an estimate 


* Kelley, Т. L., “The Reliability of Т. res,” Education?! 
Research, 1921, 3, 327. VA s Paren Sea 
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м, of the true score (1.е., co). To illustrate the use of бі. suppose that 


in a group of 300 college freshmen the reliability coefficient of an 
aptitude test in mathematics is .92 and the SD of this distribution is 


15.00. From formula (82) we have 


014 = 15\/1 — 92 = 4.2 or 4 in whole numbers 


and the odds are 2:1 that the obtained score made by any individual 
in the group does not differ from his true score by more than +4 
points. If subject AB has a score of 85, we may feel confident (the 
chances are .95) that his score "actually" lies between 77 and 93 
(1.96 X 4.2).* Generalizing for the entire group, we should expect 
about two-thirds of the 300 scores to be in error by 4 points or 
less; the other one-third (or 100) to be in error by more than 4 
points. 

Тһе reader should note carefully the difference between оо.) (see 
p.162) and ош. The first formula enables us to say with what degree 
of assurance we can predict an individual's score on one test when we 
know his score on a second (and usually a different) test. The actual 
prediction of the most probable score is made, of course, by way of 
the regression equation connecting the two variables (p. 159). The 
SE of an obtained score, 01, is also an estimate formula; it tells us 
ained score represents the true score. Although 
the true score is unknown, we can, nevertheless, tell from 61. how 
much our obtained score probably misses the true value. The SE of 
an obtained score is the best method of expressing the reliability of a 
test, since it takes account of the self-correlation of the test as well 
as of the variability within the group. 

Formula (82) provides à general estimate of the SE of any score 
over the entire range of the test. When the range is wide, the agree- 
ment of scores on two forms of the test may differ considerably at 
successive parts of the scale. To refine our estimate of the reliability 
Of our test scores, we may compute 61. for different levels of achieve- 
ment. This has been done for the new Stanford-Binet; the бі, for 
1.0.8 130 and above, for example, is 5.24, for 1.075 90-109, 4.51, for 
1.0: 70 and below, 2.21, etc. Тһе method is described in the refer- 


how adequately an obt 


ences given below.T 


* See 187. E 4 
t Шона) L. M. and Merrill, M. A., Measuring Intelligence (Boston: Hough- 


ton Mifflin Co., 1987), p. 46. x Е. 
McNemar, Quinn, “The Expected Average Difference between Individuals 


Paired at Random,” Journal of Genetic Psychology, 1933, 43, 488-439. 
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6. The dependence of the reliability coefficient upon the range and vari- 
ability of the group 


The reliability coefficient of a test administered to a group of small 
Tange (a single grade, say), cannot be compared directly with the 
reliability coefficient of the same test administered to a group of 
greater range, e.g., to the children in several grades. The self-corre- 
lation of a test (like any correlation coefficient) is affected by the 
variability of the group; and the larger and more heterogeneous the 
group, the greater test variability tends to be. If we know the self- 
correlation of a test in a narrow range we can estimate the self-corre- 
lation of the same test in an increased range (ordinarily a larger 
group) by the formula 


Deve EET (83) 


a V1 — ть 
(relation between o’s and reliability coefficients obtained in different 
ranges when the test is equally effective throughout both ranges) 
in which 
9, and оу = the o’s of the test scores in the small and large ranges, 
respectively; 
Ға and ту = the reliability coefficients in the small and large ranges. 


То illustrate the use of formula (83) suppose that for a single fifth 
grade, r,, = .50, and о, = 5.00; and that for a larger group made up 
of children from grades three to seven, о; = 15.00. Assuming our test 
to be as effective in the wide range as in the narrow, what is the reli- 
ability coefficient of the test in the wide range? If we substitute for 
б 6; and гь, in formula (83) ry = .94. "This means that a reliability 
coefficient of .50 in the narrow range indicates as high a degree of test 


consistency as a reliability coefficient of 94 in a group in which the 
range is three times as wide. 


0711. The Validity of Test Scores 
The validity of a test, or of any measuring instrument, depends 


upon the fidelity with which it measures whatever it purports to 
measure. A homemade yardstick is valid when measurements made 


2 


1 
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“Ye. by it are proved to be accurate by standard measuring rods. And in 
the same way a test is valid when the capacity which it gauges cor- 
responds to the same capacity as otherwise objectively measured and 
defined. The difference between validity and reliability can be made 
clear, perhaps, by an illustration. Suppose a clock is set forward 
twenty minutes. If the clock is a good timepiece, the time it “tells” 

| will be reliable (i.e., consistent) , but it will not be valid as judged by 

“standard time.’ The reliability of the measurements made by 
scales, thermometers, yardsticks, chronoscopes, clocks, etc., is deter- 
mined by making repeated measurements of the same facts; and 
^* — validity is determined by comparing the measures returned by the 
given instrument with highly precise (if arbitrary) standard meas- 
ures. The reliability of mental measures is found in the same way. 

But since precise and independent standards (criteria) are rarely 

found in mental measurement, the validity of a test can never be 

estimated as precisely as can the validity of a thermometer or a 


rheostat. w^ 


1. The determination of validity through correlation with a criterion 


n Тһе validity of a test is determined directly, whenever possible, by 
finding the correlation between the test and some independent cri- 
terion. A criterion is an objective measure in terms of which the 
value of the test is estimated or judged. The criteria for evaluating 
à general intelligence examination, for example, may be school 
marks, ratings for aptitude in learning, or some other test believed to 
be valid, such as Stanford-Binet. А trade test may be validated 
against demonstrated ability to carry on the required operations as 
shown in actual performance.* A high correlation between a test and 
а criterion is evidence of validity provided the test and the criterion 
are both reliable. But before accepting criterion correlations, we 
must know the reliability of the test and if possible the reliability of 
the criterion. 

When a criterion is not immediately available, indirect methods 
may be utilized for estimating the validity of a test. We may, for 
example, compute the average correlation which each test in a bat- 
tery shows with all of the other tests, and estimate the validity (i.e., 
the representativeness) of each test by the size of its correlations. 
Again, following essentially the same method, we may combine the 


*Stead, W. H., and Shartle, C. L., Occupational Counseling Techniques, 
ор. cit., Chapters 5 and 8 especially. 
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scores on a number of tests designed to measure the same function 
(memory, say), and consider as most valid that test which correlates 
highest with the average of them all. Anastasi,* for example, found 
that of eight tests of immediate memory, the paired-associates test 
(geometric form paired against numbers) had the largest average 
correlation (i.e., 49), with the other tests of the battery. This test, 
then, is the most valid measure of the function tapped in common by 
all of the tests. 


2. The correction for attenuation 


The correlation between a test and its criterion will be reduced if 
either the test scores or the criterion scores or both are unreliable. 
In order to estimate the correlation between true scores in two vari- 
ables, we need to make a correction which will take account of the 


unreliability in both sets of measures. Such a correction is given by 
the formula 


gru ER EL MM (84) 


М?" ХТәп 
(correlation between true measures in Tests 1 and 2) 
in which 


Та = correlation between true scores in Tests 1 and 2; 

712 = correlation between obtained scores in Tests 1 and 2; 
fu = reliability coefficient of Test Js 

Тап = reliability coefficient of Test 2. 


Formula (84) is the well-known correction for attenuation for- 
mula. It provides a correction for the effects of those chance or acci- 
dental errors in the two tests which lower the reliability coefficients 
of both tests and thus affect the correlation between them. To illus- 
trate the application of formula (84), let the obtained correlation 
between two tests А and B be 60, the reliability coefficient of Test A 
be .80 (ти) and the reliability coefficient of Test B be .90 (тәп). 
What is the correlation between Tests A and B freed of chance 
errors? Substituting the given values in formula (84), we have 


Tow 


60 
SS /й! 
V/.80 X. .90 
ав the estimated correlation between true scores in А and В. Our 


* Anastasi, A., “А Group Factor in Immediate M ? Archives of Psychol- 
ogy, 1930, 120, p. 41. ЧТ D 


— 
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м, corrected coefficient of correlation represents the relationship which 


we should expect to obtain if our two sets of test scores were perfect 
measurements. 

It is clear from formula (84) that correcting for chance errors will 
always raise the correlation between two tests—unless the reliability 
coefficients are both 1.00. Chance errors, therefore, always lower 
or attenuate an obtained correlation coefficient. The expression 
лит X Tanı sets an upper limit to the correlation which we can obtain 
between two tests as they stand. In the example above, V/.80 X 90 
= .85; hence, Tests A and В cannot correlate higher than .85, as 
otherwise their corrected r would be greater than 1.00. 

Let us assume the correlation between first-year college grades and 
a general intelligence test to be .46; the reliability of the intelligence 
test to be. .82; and the reliability of college grades to be .70. The 
maximum correlation which we could hope to obtain between these 


two measures is LS or .60. Knowing that the correlation 


ViI0X82 UT 
between grades and general intelligence, corrected for errors of meas- 


urement, has a probable maximum value of .60 gives us a better 
notion of the “intrinsic” relationship between the two variables. At 
the same time, the investigator should remember that the 7 of .60 is 
a theoretical, not an obtained, value; that it gives an estimate of the 
relationship to be expected when the tests are more effective than 
they actually were in the present instance. If many sources of error 
are present so that considerable correction is necessary, it would be 
better experimental technique to improve the tests and the experi- 
mental conditions than to correct the obtained r. 

The investigator must be careful how he applies formula (84) to 
correlations which have been averaged, as in such cases the reliability 
coefficients may be lower than the correlations between the two tests, 
When this happens т is greater than 1.00. Such a result is logically 
and psychologically meaningless. If a corrected r is 1.00, or is only 
slightly greater than 1.00, however, it may be taken as indicating 
complete agreement between the two variables within the error of 


computation. 


3. The estimation of the true c of a test 


Chance or variable errors have a marked effect upon the standard 
deviation of a test, as well as upon the т between tests. The relation 
of the o calculated from obtained scores on a test to the в of true 
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scores on the same test is given by the formula 


On = O1V Tit (85) 
(relation between true and obtained o’s for a set of test scores) 
in which 
O» = the c of the true test scores; 


бі = the c of the obtained test scores; 
ту = the reliability coefficient of the test. 


Suppose an educational achievement test of seventy-five items 
has been administered to a group of fifty children. The obtained 
standard deviation, бі; is 10, and the reliability coefficient of the test 
(rii) is .50. What is o», the с of the true scores from which variable 
or accidental errors have been eliminated? Substituting бі = 10, and 
түү = .50 in formula (85) 


Om = 10.50 
=7.1 


and the “true o” of the test is about 7 points. 

It is clear from (85) that c4 will always be smaller than 01, except 
in the improbable case in which rır = 1.00. The effect of chance 
errors of measurement, then, is always to increase the spread (01) of 
obtained test scores or of criterion scores. 


4. Validation of a test battery * 


A criterion of job efficiency, say, or of success in salesmanship may 
be forecast by a battery consisting of four, five, or more tests. The 
validity of such a battery is determined by the multiple correlation 
coefficient, №, between the battery and the criterion. The weights to 
be attached to scores on the sub-tests of the battery are given directly 
by the regression coefficients (p. 393). 

If the regression weights are small fractions (as they often are) 
whole numbers may be substituted for them with little if any loss in 
accuracy. For example, suppose that the regression equation join- 
ing the criterion and the tests in а battery reads as follows: 


C (criterion) = 4.32X; + 3.12X; — .65X3 + 8.35X4 + K (constant) 
Dropping fractions and taking the nearest whole numbers, we have 


* Gulliksen, H., Theory of Mental Tests (New York: John Wiley and Sons 
1950), Chapter 20 especially. 
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C = 4X, 3X» — 1Х;--8Х,--К 


Scores in Test 1 should be multiplied by 4, scores in Test 2 by 3, 
scores in Test 3 by —1, and scores in Test 4 by 8, in order to pro- 
vide the best forecast of C, the criterion. The fact that Test 3 has a 
negative weight does not mean that this test has no value in forecast- 
ing C, but simply that the best estimate of C is obtained by giving 
scores in Test 3 a negative value. 


111. Пет Analysis 


In Section II above, we considered the validity of final test scores. 
Тһе validity of a test score also depends directly upon the care with 
which the items in the test have been chosen. While the subject of 
item analysis properly belongs in a book on test construction, the 
main features of the process may be outlined here. Item analysis 
may be divided into three main topics: (1) item selection, (2) item 


difficulty, and (3) item validity. 


І. Item selection 


The initial choice of test items depends upon the judgment of com- 
petent persons as to the suitability of the material for the purposes of 
the test. Certain types of items, for instance, have proved to be gen- 
erally useful in intelligence examinations, Problems in mental arith- 
metic, for example, vocabulary, analogies and number series comple- 
tion, are often encountered; also, items requiring generalization, 
interpretation and the ability to see relations. The validity of most 
standard tests of educational achievement depends upon the consen- 
sus of teachers and other competent judges as to the adequacy of the 
items included. Courses of study, requirements for different grades, 
curricula from different sections of the country are carefully culled 
over by the test makers to determine what material in history, Eng- 
lish, geography, etc., should be included in an educational achieve- 
ment battery designed, say, for the seventh grade. In its final form 
the educational achievement test represents items carefully selected 
from all available sources of information. 

Items used in personal data sheets, interest inventories, attitude 
scales and the like, also represent a consensus of experts as to the 


most diagnostic items in the areas sampled. 
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2. Item difficulty м 

The difficulty of an item is determined by the proportion of some 
standard group able to solve the item correctly. The scaling of sep- 
arate test items has been described in Chapter 12, page 301. When 
normality of distribution can be assumed for the ability being meas- 
ured, single items or groups of items (scores) may be scaled, i.e., 
given difficulty values along a scale in terms of c. It has been cus- 
tomary to select items for a test which vary in difficulty from easy to 
hard. The average person in the standardization group will then pass 
about one-half (50%) of the items in the test. It can be shown, how- 
ever, that the sharpest discrimination as between good and poor sub- 
jects is provided by items which are passed by 50% of the members 
of a group. A test made up of items all of which are passed by ap- 
proximately 50% (but by different persons, of course) would theo- 
retically be the most discriminating test. But it would be difficult to 
construct such an examination and it is probable that a test made up 
of items covering a wider range of difficulty is psychologically a 
better measuring device. In standardizing a test care much be taken 
that few, if any, subjects achieve perfect or zero scores, as in neither 
case is the person measured by the test. 


3. Item validity 


An often-used method of validating a test item is to determine 
whether the item discriminates between subjects differing sharply in 
the function being measured. This “criterion of internal consistency” 
admits into the final test or questionnaire only those items which 
have been found to separate high-scoring and low-scoring members 
of the group. In an internally consistent test, items “hang together” 
in the sense that they work in the same direction and measure the 
same common trait.* In one study,+ eighty-six items were selected 
out of 222 on the basis of their ability to discriminate among the 
lower, middle, and upper thirds of the group. These eighty-six 
“good” items did a better job (higher reliability and validity) than а 
test two and a half times as long. 


The validity of a single test item may also be determined by find- 


* Ferguson, G. A., “The Factorial Interpretati ifficulty,” Psycho- 
тете 1941, 6, 223-329. erpretation of Test Difficulty, uy 


f Anderson, J. E., “The Effect of Item Analysis upon the Discriminative 
Power of an Examination,” Journal of Applied Psychology, 1935, 19, 237-244. 
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%. ing its correlation with total scores іп the test of which it is a part, or 


by finding its correlation with scores in some independent criterion. 
The bi-serial method (p. 356) is the standard procedure for determin- 
ing item validity through correlation. Application of bi-serial r to 
each item in a test requires considerable computation, however. For 
this reason various short-cut methods for selecting good items by for- 
mula and by graphical methods have been devised. References given 
below should be consulted.* 


PROBLEMS 


1. The reliability coefficient of a test is .60. 
(a) How much must this test be lengthened in order to raise the self- 
correlation to .90? 
(b) What effect will doubling the test's length have upon its reliability 
coefficient? tripling the test’s length? 
2. A test of fifty items has a reliability coefficient of .78. What is the reli- 


ability coefficient 
(a) of a test having 100 items comparable to the items in the given test? 


(b) of a test having 125 comparable items? 


3. А given test has a reliability coefficient of .80 and a с of 20. 
(a) What is the maximum correlation which this test is capable of yield- 


ing as it stands? ) 
(b) What is the standard error of а score obtained on this test? 
(c) What is the estimated reliability coefficient of this test in a group 


in which the o is 15? 
4. A test of 100 items is given to a group of 225 subjects with the following 
results: М = 62.50; о = 9.62. 
(a) What is the reliability coefficient of the test by formula (78)? 
(b) What is the estimated true o of this test? 
(c) What is the standard error of a score on this test? 


* Long, John A., and Sandiford, Peter, The Validation of Test Items, Bul- 
letin 3, 1935, University of Toronto, Department of Educational Research. 
Flanagan, J. C., “General Considerations in the Selection of Test Items,” 


Journal of Educational Psychology, 1939, 30, 674-680. 


соч; Jibs «nhe адын and Chi-square as Indices of Item Val- 
idity,” Psychometrika, 1941, 6, 11-19. [ 
Койын, М. W., and Adkins, D. C., “А Rapid Method of Selecting Test 
Items," Journal of Educational Psychology, 1928, 29. 547—552. 
Hawkes, H. E., Lindquist, E. R., and Mann, C. R., Achievement Ezamina- 
tions (Boston: Houghton Mifflin Co., 1936), Chapters 2 and 3, especially. 
Gulliksen, H., Theory of М ental Tests, op. cit., Chapter 21. 
. Davis, F. B., Jtem-Analysis Data: their computation, interpretation, and use 
in test construction, Cambridge, Mass.: Harvard Educ. Papers, #2, 1946. 
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5. 


Show (a) that when the reliability coefficient is zero, the standard error 
of an obtained score equals the standard deviation of the test; and (b) 
that when the reliability coefficient is 1.00, the standard error of an 
obtained score equals zero. 


. À mathematics test has a reliability coefficient of .82, and a mechanical 


ability test has a reliability coefficient of 76. Тһе т between the two 

tests is .52. 

(a) What would the correlation be if both tests were perfect measures? 

(b) What is the maximum correlation possible with the mathematics 
test as it stands? 

(c) What is the maximum correlation possible with the mechanical abil- 
ity test as it stands? ^ 


. An intelligence examination shows a correlation of .50 with first-year 


Scholarship. The reliability coefficient of the test is 85, and of school 
grades (i.e., the criterion) is .65. What is the highest validity coefficient 


which we can hope to get with this test (1.е., corrected correlation be- 
tween test and grades) ? 


- A test of seventy-five items has а g, of 12.35. The Урд = 16.46. What 


is the reliability coefficient by formula (77)? 


ә ANSWERS 


- (a) six times 


(b) тү = .75 (doubling length); ry = 82 (tripling length) 


. (а) 88 


(b) 90 


- (а) 39 


(b) 89 
(c) .64 
(а) 75 
(b) 834 
(c) 481 


. (a) .66 


(b) .91 
(c) .87 


68 
90 


14 


FURTHER METHODS OF CORRELATION 


+ 


In Chapter 6 we described the linear, or product-moment correla- 
tion method, and in Chapter 7 showed how, by means of 7 and the 
regression equations, one can “predict” or “forecast” values of one 
variable from a knowledge of the other. Test scores, as we have seen, 
represent a series of determinations of а continuous variable taken 
along a numerical scale. The correlation coefficient is valuable to 
psychology and education as а measure of the relationship between 
test scores and other measures of performance. But many situations 
arise in which the investigator does not have scores and must work 
with data in which differences in a given attribute can be expressed 
only by ranks (e.g, in orders of merit); or by classifying an indi- 
vidual into one of several descriptive categories. This is especially 
d psychology and: in the field of person- 
ality and character measurement. Again, there are problems in which 
the relationship among the measurements made is non-linear, and 
cannot be described by the product-moment 7. In all of these cases 
other methods of determining correlation must be employed; and the 
purpose of this chapter is to develop some of the more useful of these 


techniques. 


true in vocational and applie 


І. Computing Correlation from Ranks 


Differences among individuals in many traits can often be expressed 
by ranking the subjects in one-two-three order when such differences 
cannot be measured directly. For example, persons may be ranked in 
order of merit for honesty, athletic ability, salesmanship, or social 
adjustment when it is impossible to measure these complex behaviors. 
In like manner, various products or specimens, such as advertise- 
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ments, color combinations, handwriting, compositions, jokes, and 
pictures, which are admittedly hard to measure, may be put in order 
of merit for esthetic quality, beauty, humor, or some other character- 
istic. In computing the correlation between two series of ranks, spe- 
cial methods which take account of relative position have been 
devised. These methods may also be applied to scores which have 
been arranged in order of merit. When we have only a few scores 
(less than 25, say), it is often advisable to rank these scores in order 
of merit and compute the correlation by the rank-difference method 
instead of by the longer and more laborious product-moment method. 
Coefficients of correlation calculated from a few cases are not very 
reliable at best, and their chief value lies in suggesting the possible 
` existence of relationship—as in a preliminary survey. In such situa- 
tions the rank-difference method will give as adequate a result as that 
obtained by a more refined technique, and is much easier to apply. 


1. Calculation of o (rho) from rank-differences 


is to find the relationship between the length of service and the sell- 


of ranking the first man 7 and the second 
oth 8, we compromise by ranking both 7.5 and 


man's efficiency rank and his 
the last column each of these 


* If three men receive the same Tank, ер. 7, 8, 9, each is ranked 8 and next 


man in order is ranked 10. If four men receiye thi К, e.g., 7, 8, 9, ап 
10, each is ranked 8.5 and the next in order 1° рсы E. V" 


> 


pa) 
<<? 
% 
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TABLE 46 To illustrate the rank-difference method of measuring mum 


lation 
(1) Q) (3) (4) (5) (6) 
Difference " 
Order of Orderof . E Diff 
Salesmen Yoan of “Мег Merit bety Fem ЖАЛЫ 
се — (Service) (Efficiency) ( D) (рз) 

А 5 7.5 6 1.5 2.25 

в 2 11.5 12 -.5 .25 

с 10 2 1 1.0 1.00 

D 8 4 9 -5.0 25.00 

Е 6 6 8 -2.0 4.00 

F 4 9 5 40 16.00 

G 12 1 2 -10 1.00 

н 32 11.5 10 1.5 2.25 

I 7 5 3 2.0 4.00 

J 5 7.5 7 5 .25 

K 9 3 4 1.0 1.00 

Т; 3 10 11 1.0 1.00 
N = 12 58.00 

62D? G58 = (80 (86) 


ә-1і-ұйа- 1 12048) ~ 


tion between the two orders of merit may be computed by substitut- 
in, for XD? and N in the formula . 
Ра 62D? 
EU E 86 
5 Ұ(Ма--1) v (86) 
(rank correlation coefficient, о) 


in which D represents the difference in rank of an individual in the 


уго series; ED? is the sum of the squares of all such differences; and 


N is the number of cases. Substituting 58 for the =D? and 12 for N 
in formula (86), we obtain a 9 of .80. The symbol о (read as rho), is 
the rank order coefficient of correlation. 0 may be transmuted into a 
product-moment т by means of tables, but the difference between 0 
and its equivalent 7 is so small that with little loss of accuracy 0 


may be taken as approximately equal to r. 


2. The significance of о (rho) 3 


Since о is at best only an approximate measure of the relation- 
ship indicated by т, it is hardly worth while computing its SE. Per- 
haps the best way of estimating the reliability of o if it is wanted is 
to test the obtained value of о against the null hypothesis by means 


ў 
" 7 ғ тун NA ae Е 
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of Table 25, p. 200. Thus for the problem in Table 46 we find for 
(N — 2) or 10 df that ап r must be .71 to be significant at the .01 
level. Our computed o of .80 is considerably larger than .71 and 
hence is statistically significant though based upon only 12 ranks. 


3. Summary on rank-difference correlation 


(The product-moment method deals with the size of the score а” 


well as its position in the series. Rank-differences, on the other hand, 
take account only of the positions of the items in the series, making 
no allowance for'the size of the gaps between adjacent scores. Indi- 
viduals, for example, who score 90, 89, and 70 on a given fest are 
ranked 1, 2, 3 in order of merit, although the difference between 90 
and 89 is 1, and the difference between 89 and 70 is 19. Considerable 
accuracy may be lost in translating scores over into ranks, as gaps 
will appear in the rankings when a number of scores, all of the same 
size, receive the same rating. The rank-difference coefficient is rarely 
used with test scores when N is larger than 30 and is often an explor- 
atory and preliminary device. 


Il. Measuring Correlation from Data Grouped 
into Categories 


l. Bi-serial correlation 


, In many problems it becomes important to calculate the correla- 
tion between traits or attributes, when the members of the group can 
be measured (i.e., given scores) in the one variable, but can only be 
classified into two categories in the second or “dichotomous” varia- 


ble. (The term dichotomous means “cut into two parts.”) We may,” 


for instance, wish to know the correlation between MA and “social 
adjustment” in a group of nursery-school children, when our subjects 
have been given scores in the first trait, but are simply classified as 
“socially adjusted” or “not Socially adjusted” in the second trait. 
Other examples of dichotomous classification with reference to some 
attribute are athletic-nonathletic, radical-conservative, socially 
minded-mechanically minded, literate-illiterate, above eighth grade 
in school-below-eighth grade, and the like. The correlation between à 
set of scores'and two-category classifications like those listed cannot 
readily be found by the ordinary product-moment r or by the rank- 


"uy 


TABLE 47 To illustrate t 
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ч. difference formula. We can, however, compute a bi-serial coefficient 


of correlation if we may assume that the trait in which we have made 
a two-way split would be continuous and normally distributed if 
more information were available. 

Many test and question items are scored to give two responses: 
for example, problems marked Passed or Failed, statements True or 
False, personality inventory items Yes or No, interest items Like or 
Dislike, and so on. When a two-category split cannot be regarded 
as representing a normal distribution but is in fact two separate 
groupings, the point bi-serial т provides a useful measure of relation. 


(1) CALCULATION OF BI-SERIAL 7 І 

The calculation of bi-serial r is illustrated in Table 47. The prob- 
lem is to find the correlation between total scores on a test and the 
answers to a single item in the test (Item 72); or put differently, to 
find whether those who make high scores on the test tend to answer 
Item 72 “Yes” more often than “No.” The first column of Table 47 
gives the class-intervals of the score distribution. Column two gives 


he calculation of the bi-serial r between total 


scores on a test and the answers to а single item on the test 


Scores 


on Test yee “No” f M= 5845; man of all scores 
80-84 3 3 с = 11.63; ø of all scores (№ = 100) 
75-79 4 2 6 М, = 60.08; mean of “Yes” responses 
70-74 6 2 8 =160) i ет, 
65-69 5 5 10 М, = 55.00; mean of “No” responses 
60-64 10 B 19 (N = 40) . 
et) 10 = .60; proportion answering “Yes” 
50-54 15 A pce Item 72 4 
1 id 2 2 5 q = .40; proportion answering “No” 
35-39 y 4 4 to Item 72 
30-34 2 2 z = .386; height of ordinate separat- 
25-29 1 1 ing 60% from 40% in a nor- 
60 20 100 mal distribution (Table 48) 
(р) (Ф (ун m ) 
2 is 
њ.= Meas. PEs) с р (80) 
_ 60.08 - 55.00 (.60) (.40) E ) 
|ui AN _ (вв = 2? 
= 27 7100. 
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the distribution of scores made by the sixty subjects who answered 
“Yes” to Item 72, and column three the distribution of scores made 
by the forty subjects who answered “No.” The sum of all of the fre- 
quencies on the score-intervals gives the total distribution of 100 


cases (column four). The steps in calculating bi-serial r from here 
on are as follows: 


Step | 

Calculate M, the mean of the scores made by the sixty subjects 
who answered “Yes” to Item 72. Also calculate Му, the mean of the 
scores made by the forty subjects who answered “No” to Item 72. In 
our problem, М; = 60.08, and M, = 55.00. 
Step 2 


Calculate the o of the whole distribution —the distribution of the 


100 scores. This o, which equals 11.63, gives the spread of the test 
Scores in the entire group. 


Step 3 


Sixty percent of the group (p) answered "Yes" to Item 72, and 
40% (4) answered “Хо” (р always equals 1 — 4). Assuming a nor- 
mal distribution of opinion on this item (varying from complete 
agreement on through indifference to complete disagreement) MpoD 
which a dichotomous division has been forced, we place the dividing 
“No” groups at a distance of 10% from 
hown in the figure below. 


line between the “Yes” and 
the middle of the curve, as s 
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“te. From Table 48, the height of the ordinate (i.e., 2) which is 10% from 
the mean of a normal distribution is .386. 


Step 4 
Having computed Mp, Ma, о, p, Ф and 2, we find ты. from the for- 
mula 
М,-М 
Tris = == х 21 (87) 
ча (bi-serial coefficient of correlation or bi-serial r) 


in which, as illustrated by the problem above, and shown in Table 47 


M, = mean of the group in the first category (usually the group 
showing superior or more desirable characteristics) 
M, = mean of the group in the second category 
с = standard deviation of the entire group 
p = proportion of the whole group in category one 
4 = proportion of the whole group in category two (р-1- 9) 
z = height of the ordinate in the normal curve dividing p from q 


In Table 47, тыз is .27, indicating a tendency, though not a strong 
one, for “Yes” answers to Item 72 to accompany high total scores. 


(2) THE SE or BI-SERIAL T 
Provided neither p nor q is very small (e.g., smaller than .05), an 
approximate formula for the standard error of bi-serial r is 


TEN 


(SE of гыз for values of p and q greater than .05) 


(88) 


A comparison of formula (88) with the classical SE, formula for a 
product-moment r (see p. 197) shows that БЕ, is somewhat larger 
than SE, and becomes increasingly larger as the difference between 
p and q widens: from p = .50,9 = 50 to p = .95, 9 = .05, say. In the 
problem of Table 47, Tvis = 27 and SE,,,, = .12. То test the reliabil- 
ity of this ry, in terms of its SE, we must assume that the sampling 
distribution of r is normal, put the population r at the center of the 
distribution (Fig. 46, p. 355), and take SE, to be the SD of the sam- 
pling distribution of r’s. When we do this, the .95 confidence-interval 
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for the true тъ, is from .03 to .51 (ry, + 1.96 X SE,,,, Or .27 + 24). 
This wide range shows that ғы is probably indicative of some degree 
of positive correlation (the lower limit of the confidence-interval is 
08), but it is impossible to say accurately just how much. 


TABLE 48 Deviates.(x/c) in terms of o-units and ordinates (z) for given 
areas measured from the mean of a normal distribution 
whose total area = 1.00 


[z/o = x] 

Ша reo а. Аеш vet а 
00 000 399 26 1706 311 
01 025 399 27 739 304 
92 050 308 28 779 E 
‘04 - 1100 307 30 p 280 
5 196 396 31 878 271 
06 151 394 32 915 262 
à 4 2 9 : 

9 228 289 35 1.036 233 
10 252 386 36 1.080 223 
11 279 384 37 1196 212 
19 305 381 38 1175 200 
13 332 378 39 1297 188 
14 358 374 40 1982 176 
15 385 370 ‘41 1341 162 
16 412 1366 42 1405 149 
17 440 369 43 1476 134 
18 468 358 44 1.555 119 
19 496 353 45 1645 103 
2 594 348 46 1751 085 
21 553 342 ‘47 1.881 065 
2 583 237 48 2.054 |055 
613 331 49 2.326 9 
24 ‘643 224 000 
25 675 318 ik т» 
LES „у Rs "a srt (МИНҮҮ 


(3) AN ALTERNATIVE FORMULA FOR BI-SERIAL T 


"There is another—and slightly different —formula for bi-serial т 
which is often useful. This is 


Тыв = 


›— М, D 
gg = 


(bi-serial coeficient of correlation or bi-serial r in terms of Mr, thé 
mean of the total group) 

in which 

M, = mean of the group in the first (or p) category 


s 
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“Mr = mean of entire group 


o = standard deviation of entire group 
р = proportion of whole group in category one 
z = height of ordinate in normal curve dividing p from q 


Substituting in formula (89) the values for Mp, Mz, ©, P, and 2, 
shown in Table 47, we have 


60.08 — 58.05 х 600 _ 97 


_ Одан оола 
E 11.63 386 
which checks our previous result. 

Formula (89) is especially well suited to those problems in which 
sub-groups having different characteristics are drawn from a larger 
group, the larger group mean (Мт) remaining the same. 

The bi-serial correlation method has frequently been used in deter- 
mining item validity,” that is, in finding whether success or failure 
upon a given item is correlated with total score in the test or with 
score in some criterion (Table 47). If those who achieve high scores 
in the criterion get an item right more often than those who make 
low scores, the item will be positively correlated with the criterion. 
Such an item is a good measure of the criterion while one which cor- 
relates zero or negatively with criterion scores is a poor measure. 


(4) THE POINT BI-SERIAL COBFFICIENT 

When items are scored 1 if correct and 0 if incorrect, that is, as 
either-or, the assumption of normality in the distribution of right- 
wrong responses is unwarranted. In such cases the point bi-seriál т 
rather than bi-serial 7 should be used. The point bi-serial method 
assumes that the behavior which has been classified into two cate- 
gories can be thought of as occurring at two distinct points or modes 
instead of along а graduated scale or continuum. Point bi-serial * 
has proved to be useful in item analysis. The formula is 
SEES (90) 


Toys — 

(point bi-serial coefficient of correlation) 
While (87) is often used in item analysis, (90) is somewhat more 
defensible and is easier to apply. Point bi-serial r’s are lower than 


* Long, J. A., and Sandiford, Peter, Тһе Validation of Test Items, Depart- 
ment of Educational Research, University of Toronto, 1935, Bulletin #3, 16-17. 
. t Richardson, M. W., and Stalnaker, J. L., «А Note on the Use of Bi-serial т 
in Test Research,” Journal of Genetic Psychology, 1988, 8, 463-465. 
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bi-serial r’s and are not directly comparable either to тъ,’ or to prod- 
uct-moment r's. For example, the validity index of Item 72 (Table 
47) by formula (90) is 21 ав compared with the ry, of 27. 


2. Tetrachoric correlation 


We have seen in the last section that when one variable is continu- 
ous and is expressed in the form of test scores, and the other is 
dichotomous or in a twofold classification, bi-serial r provides а 
measure of relationship between the two. An extension of the prob- 
lem of finding correlation between categories to which bi-serial r is 
not applicable presents itself when both variables are dichotomous. 
We then have a 2 X 2 or fourfold table, from which a modified forra 
of the product-moment coefficient, called tetrachoric т, may be calcu- 
lated. Теігасһогіс т is useful when one wishes to find the relation- 
ship between two characters or attributes neither of which is directly 
measurable, but both of which are capable of being separated intc 
two categories. Thus, if we wish to measure the correlation between 
school attendance and employment, persons might be classified into 

‚ those who have attended high school and those who have not; and 
into those who are employed and those who are unemployed. Or, if 
we wish to discover the correlation between intelligence and social 
maturity, children might be classified as above average and below 
average in intelligence, on the one hand, and as socially mature and 
socially immature on the other. Tetrachoric correlation assumes that 
the two variables being studied are essentially continuous, and would 


be normally distributed if it were possible to classify them more 
exactly into finer groupings. 


(1) CALCULATION OF TETRACHORIC Т 


' Table 49 illustrates a.2X 2 fold table, and shows the steps in- 
volved in calculating tetrachoric 7. The problem is to find whether à 
larger number of successful than of unsuccessful salesmen tend to be 
“socially well adjusted.” The data are artificial. The X-variable 
(along the top of the diagram) is divided into two categories “suc- 
cessful” and “unsuccessful”; and the Y-variable (along the left of 
the diagram) is divided into two categories “socially well adjusted” 
and “socially poorly adjusted.” The sums of the rows show that 
sixty salesmen (а +b) out of the sample of 100 are classed as well 
adjusted socially, and that forty salesmen (c--d) are classed as 
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i TABLE 49 To illustrate the calculation of tetrachoric r (r;) 
(The data are hypothetical) 


X-variable 


100 Salesmen 
` Totals 
Unsuccessful | Successful 
= Socially Well 25 - 35 60 
* Adjusted (b) (a) p = 60% 
* | Socially Poorly 30 10 40 
м Adjusted @ (с) 4 - 40% 
, 55 45 100 
Totals 4 = 55% p' = 45% 
For p = .60,4 = 40, a = 10 For p’ = 45,47 = .55, а = .05 
x = — .253 [ане 48 x’ = .126 [гае 48 
z= .886L Fig. 58 2' = .396 L Fig. 58 
ai—be xen (91) 
ДЕИТ далар; ү 
_1050—250_ _ ‚ү 253126) 
100%(:386) (.396) 2 
А, 593 = r — .016r 
or 101672 — r + .523 = 0* 
ла VIZ 200180828) _ із VI Айт 
E 2 X 016 1032 
| зА + 1 + .9831 
.032 


| = 53 (taking numerator as + 1 — 0831) 
| = + AS canis numerator as + 1 + .9831) 


MC * Th al form of a quadratic equation is ал? + bz + c = 0. The 
| two ҮШ ОП (i.e., the roots of the equation) may be computed by the 
formula - 
— b + УЫ — 4ac 
2a 


z= 


In the equation .0167:—r--.523—0, а=.016; b=—1.00; and c—.523. 


Hence, 
r= +1+V1— 4(.016) (.523) 


2 X .016 
— .53 or 62 (an impossible value) 
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poorly adjusted socially.* Тһе proportions in each category (p 
and q) are 60% and 40%, respectively. The sums of the columns 
show that fifty-five of the 100 salesmen are classified as unsuccessful, 
and forty-five as successful; the proportions are 55% (9) and 45% 
(p’). On the assumption that “social adjustment" is distributed 
normally, from the proportions p — 60, and д = .40, we obtain an 
x = —.253, and 2 = .386. These last two values are read from 
Table 48 as follows: Тһе perpendicular line (ie., the ordinate, 2) 
separating the upper 60% from the lower 40% in а normal curve is 
just 1076 from the mean. Hence, entering the first column of Table 48 
with а= 10, we read х= —253 and z= .386. See diagram 
below. 


poorly 
adjusted 


adjusted 


x-.— 5 
FIG. 58 


The x" and 27 values corresponding to p’ = .45 and g’ = .55 are 
caleulated in the same way. The perpendicular line dividing the 
upper 45% (the percent Successful) from the lower 55% (the percent 
unsuccessful) is 5% from the mean; and from Table 48, for a = .05, 
*" = 126 and z’ = 396. See diagram on page 365. 


i An approximate formula for tetrachorie r may be written as fol- 
ows: 


ad—be _ жх", (91) 
Neg 


(approximate formula for tetrachoric r) 
in which 


x and x' = o-distances from the means to the points separating the 


* To accord with the plan of the ordinary correlation table (p. 128), the cate- 
gories in Table 49 have been so arranged that concentration of data in the 
first and third quadrants (a and d) denotes Positive correlation; concentration 
of data in the second and fourth (b and с) quadrants negative correlation. 
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proportion in the upper category from the proportion in 
the lower category; 
z and 27 = the heights of the ordinates at the points 


а, b, c, d = entries in the four cells, see Table 49; 
М = number of cases; i.e., sum of entries in the four cells; 


r, = the tetrachoric coefficient of correlation. 


of division; 


In Table 49, ad is found to equal 1050, and bc to equal 250. Substi- 
tuting for these quantities, and for x, x^, z, 2’, and № in formula (91), 
we obtain т, = .53. This coefficient indicates a fairly substantial cor- 
relation between success in salesmanship and social adjustment. In 
order to compute т, it is necessary that we solve a quadratic equa- 
tion. The method of carrying through this solution is given in 
Table 49 and in the footnote at the bottom of the table. Note that 
only the first of the two solutions for т; is a possible value, as the 
second is greater than unity. 

The investigator who finds it necessary to calculate many tetra- 
choric r’s may greatly shorten his work by using the computing dia- 
grams devised by Thurstone and his co-workers.* These charts 
enable one to obtain а solution for 7% by graphic methods as soon as 
the proportion within each of the four cells of the table is known. 


(2) тне SE or А TETRACHORIC 7 
The formula for SE;, is mathematically complex and is too long to 
в derivation can be found in books that deal 


be useful practically. It 
f statistical theory. If a standard error is 


with the mathematics 0 


* Chesire, L., Saffir, M., and Thurstone, L. L., Computing Diagrams for the 
Tetrachoric Correlation Coefficient, University of Chicago Bookstore, 1933. 

1 Peters, C. C., and Van Voorhis, W. R., Statistical Procedures апа Their 
Mathematical Bases (New York: McGraw-Hill, 1940), pp. 370-375. 
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wanted, an approximation to SE,, may be found in the following way. 
When p is close to .50 and N is large, SE,, is about 70% higher than 
the SE of a product-moment r of the same size as 7; and based upon 
the same N. The SE of a product-moment r of .53 is .07 for N — 100. 
Hence, the SE,, of an т,= .53 is approximately 12 (.07 X 1.70). 
The .95 confidence-interval for the true т; is .29 to .77 (i.e., .53 - 1.96 
X .12 or .53 + .24). The obtained ғ; of .53 is, therefore, indicative of 
а positive r probably as high as .29. 


(3) TETRACHORIC 7 IN TEST EVALUATION 


Tetrachoric r is often used as a means of evaluating a test’s effi- 
ciency in separating two contrasted or “criterion” groups. Àn exam- 
ple is given in Table 50 (the data are artificial). The problem is to 


TABLE 50 To illustrate the use of tetrachoric r in evaluating a given test 
М = 125 


Х-уагіаЫе 


College Juniors 
Non-Science Science 
Majors Majors 
Above Test 24 35 
Mean (je e 


Y-variable 


Below Test 29 
Mean 97 


4 - 58% 
For р = .59, q = 41 
228 


For р’ = .47, 4 = .53 
x = .075 


х= –. 

z= 389 2! = 398 
1015 — 0288 — , (— .298)(.075)r2 91) 
(389398) = 7 A f 


470 = r — .00972 
or 009 — r + .470 = 0 


r= Els VIOD 
2(.009) 


_ 1+ 9915 
.018 
= .47, or 111 (an impossible value) 


find whether a test of deductive reasoning (here, a syllogism test) 
will differentiate fifty-nine college juniors majoring in science from 
sixty-six college juniors majoring in literature or languages (non- 


| 
| 
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" science). The X-variable is divided into science majors and non-sci- 
ence majors; the Y-variable into those above and those below the 
mean of the test, 1.е., the mean score established by the entire junior 
class. The entries in the cells, a, b, c, and d, are expressed in percents, 
so that М? in formula (91) is 1.00. As shown in Table 50, the correla- 
tion between majoring in science and high scores on the syllogism 
test is 47. If one were investigating a number of tests with a view 
toward determining their relative values as indicators of scientific 
aptitude, the worth of each test could be measured in accordance 
with its ability to separate the two criterion groups.* 


3. The phi-coefficient (fourfold point coefficient) 


In a 2X 2 fold table, the ¢-coefficient provides а measure of corre- 
lation whichis equivalent to 7. Like the point bi-serial r, pht meas- 
ures relationship between items when the classification is truly 
dichotomous and is concentrated at two separate points or into two 
distinct classes. Phi is sometimes used also with continuous varia- 


bles which have been forced into two categories. 
ow show the same fourfold tables; in the first the 


The diagrams bel f 
entries represent frequencies or scores, in the second proportions. 
- + 
A+B SEI a |» 
B+D A+C q р 
The formula for phi in terms of frequencies is 
AD — BC (92) 


$= ВСТО ОТОО) 


(phi-coefficient of correlation) 


which expressed in proportions becomes 
ad — be 
== (93) 
$= pavi 
The phi-coefficient must always be used to determine the signifi- 


cance of the difference between correlated percents or proportions. 
.12 — .02 


In example (2), page 237, for example, ¢ = е тра) 


* The phi-coefficient is also useful here. 


а 
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or .41. In general, ф is lower than the corresponding r;'s and is not 
comparable to them. The phi-coefficient for the data in Table 49, for 
example, is .33 as against an ту of .53; in Table 50 it is .31 as against 
апт, of 47.* 


Phi is related to 3? by the equation у? = №42 or ф- m. The 


significance of а ф may be estimated, therefore, by converting it into 
а у? and determining the significance of 72. фів valuable when we 
want to know how performance on one item is related to performance 
on another item (see problem 5, p. 375). Phi has proved to be espe- 
cially useful in item analysis,j where the values 1 and 0 are usually 
assigned to answers right and wrong. 


4. The contingency coefficient, C 


The coefficient of contingency, C, is used to determine relationship 
when the variables under study have been put into two or more 
classes or categories. The contingency coefficient can be derived di- 
rectly from x? (p. 254); but C differs from у? in that it provides a 
measure of correlation which under certain conditions (p. 371) is 
comparable to product-moment T. C bears the following relation 


to 32: 
2 
C= | X (94) 
N +y% 


(formula for C, the contingency coefficient in terms of x?) 


In Table 33, page 263, the association between eyedness and hand- 
edness was found to be expressed by a 4? of 4.02, which for 4 df was 


mee Her eae 
not significant. By formula (94) the C for Table 33 is М 
ог .10 (to two decimals). Taken at face value and alone, this 
would indicate a negligible relationship between eyedness and hand- 
edness. The SE needed to test С is a complex expression laborious 
to compute; § so that the significance of С is best tested by its equiv- 
alent 72. In the present problem, the X? of 4.02 is not significant and 
in consequence our C of .10 is not significant, 

* Guilford, J. P., and Perry, N. C., “Estimation of other coefficients of corre- 
lation from the phi coefficient,” Psychometrika, 1951, 16, 335-346. ; 

1 Guilford, J. P., ed., Printed Classification Tests, Report No. 6, Army Air 


Forces Aviation Psychology Program Res Tashineton, D. С 
U. S. Gov't Printing Office, 1947) ch Reports (Washington, 


- fx? is a measure of probability of association. 


§ Kelley, T. L., Fundamentals о) Statistics (Cambridge, Mass.: Harvard Uni- 
versity Press, 1947). 


|; 


г 
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(1) METHOD оғ CALCULATING С 


Table 51 illustrates the computation of C from a 4 X 4 fold con- 
tingency table. The table gives the classification of 1000 fathers and 
sons with respect to eye color. The independence values for each cell 


TABLE 51 To illustrate the calculation of C, the coefficient of contin- 
gency 


Father’s Eye Color II. Calculation of C 


T Blue Gray Hazel Brown Totals (194)? ai 
; (88) | (60) | (66) 120, — Жай 
120 
5 Bue |020| CD | “a | 30 | a35 (8): 
5 mE 102 3 
o 
© (102) | (75) | (61) | (56) 
udi Е а а QD. qs 
m 49) | (36 25) | (27) 
2 Hazel 49 (36) (25) 27 Is Go Loeb 
o fn | 
pa 8 64 44 (48) (70)? 
Brown | 69 | 6) d$ | 109 00» 55.7 
Totals 958 264 180 198 1000 cay Б 
4 I. Independence Values oe 32.1 
335 X 358 187 X 358 _ 49 (36)? ` 
Do = 120 1000 ка = 203 
385 X 264 gg 187 X 204 36 GU. 280 
335 x 180 137 X 180 _ 95 вал gap 
Too -9 "490 ^7 p 
000 n 
335 X 198 137 X198 _ 27 %-- 1210 
n посі 71000 ү. D. yoo 
2 284 x 358 244 x 358 4 - € 
| 2А UU = 102 n = 87 4 
1000 1000 в Me 
284 X 264 244 X 264 _ 4 
Tuo 7 5 $00 7 26 QU. ami 
284 x 180 244 X 180 _ (23) _ 
атоо k 1000 25 uoc 196 
2 244 Х 198 (109): 
284 X 198 = 56 — 100 = 48 Wi 247.5 
S = 12708 
N = 1000 
7 S-N- 2708 
PRA SEXE 2708 _ 16 
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have been computed as shown in Table 51. From the top row, 
for example, we know that 335/1000 of all sons are described as 


335 Х 358 


blue-eyed. This proportion of 358 ( 1.е., ) gives 120 as the 


number of fathers who can be expected to have blue-eyed sons “by 
chance,” as contrasted with the 194 fathers who actually did have 
blue-eyed sons. When the independence values have been found, we 
square each obtained cell entry, and divide by its own independence 
value as shown in Table 51. The sum of these quotients gives S; and 
from $ and N, С is calculated by the formula 


C= = (95) 


(formula for C, coefficient of contingency, calculated directly) 


In Table 51, C is .46. From (94) the y? corresponding to this C is 
268, which for 9 df is highly significant—far beyond the .01 level 
(Table E). 

C possesses certain advantages over ф and ту. In computing С, 
for example, no assumption of normality in the distributions of the 
variables classified need be made; in fact any type of distribution, 
skewed or rectangular, may be utilized. C may be either plus or 
minus, the sign of the coefficient depending upon an inspection of the 
contingency table itself. In Table 51 it is clear that pigmentation of 
eyes in father and son is positively correlated * and that C must be 
positive. 

A disadvantage of C is that it does not remain constant for the 
same data when the number of classes varies. The C computed from 
à 2X2 or 3X3 table will ordinarily not be comparable to the С 
computed for the same data from a 5 X5 table, say. Furthermore, 
the maximum value which C can take depends upon the fineness of 


the classification used so that C is not directly comparable to bi-serial 
ror to7;. It can be shown that 


when the number of classes = 2, the maximum С is .707 
when the number of classes = 3, the maximum C is .816 
when the number of classes = 4, the maximum C is .866 
when the number of classes = 5, the maximum C is .894 


* We note, for example, that 194 blue-e 

i yed fathers have blue-eyed sons, 
while only 30 brown-eyed fathers have blue-eyed sons. Moreover, 109 brown- 
eyed fathers have brown-eyed sons while only 56 blue-eyed fathers have brown- 


eyed sons. Comparisons of this sort will show that association between pig- 
mentation in the eyes of father and son is positive. 


= 


| 
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when the number of classes = 6, the maximum C is .913 
when the number of classes = 7, the maximum C is .926 
when the number of classes — 8, the maximum C is .935 
when the number of classes — 9, the maximum C is .943 
when the number of classes = 10, the maximum C is .949 


In the light of this table, Yule and Kendall * recommend that we 
“restrict the use of the ‘coefficient of contingency’ to 5 Х 5 or finer 
classifications” in order that the maximum value of C may be as near 
unity as possible. At the same time, we should avoid a too-fine classi- 
fication or C will be affected by slight or “casual irregularities of no 
physical significance"; and, in addition, the arithmetic of calculation 
will be greatly (and needlessly) increased. A correction T for “broad 
categories" may be applied to C's caleulated from 4 X4 fold or 
broader groupings if C is to be compared with r. For 5 Х 5 fold or 
finer classifications, this correction is so small that for practical pur- 
poses it may be disregarded. 

Тһе relation of C to r is, under certain conditions, very close. C is 
Substantially equivalent to 7 (1) when the grouping is relatively 
fine—5 X 5 fold or finer; (2) when the sample is large; (3) when the 
two variables may legitimately be classified into categories; and (4) 
When we are justified in assuming that the variables under investiga- 
tion are normally distributed. 


II. Curvilinear or Non-Linear Relationship 


The relationship between the paired values of two sets of meas- 
ures, X and У, may be described in a general way as "linear" or *non- 
linear" When the means of the arrays of the successive columns and 
TOWs in a correlation table follow straight lines (at least approxi- 
mately), the regression is said to be linear or straight-line (p. 133). 
When the drift or trend of the means of the arrays (columns or rows) 
cannot be well described by a straight line, but can be represented by 
8 Curve of some kind, the regression is said to be curvilinear or in 
Seneral non-linear. 

Our diseussion in Chapter 6 was concerned entirely with linear 
relationship, the extent or degree of which is measured by the prod- 
Uct-moment coefficient of correlation, т. It sometimes happens in 

* Yule, G. U., and Kendall, M. G., An Introduction to the Theory of Statis- 


жү) 
tics 12% ed.; London: С. Griffin, 1940). 
eters, С. C., and Van Voorhis, W. R., op. cit, pp. 391-393. 
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mental measurement, however, that the relationship between two 
variables is definitely non-linear; and when this is true, 7 is not an 
adequate measure of the degree of correspondence or correlation. 
When the regression is non-linear, a curve joining the means of suc- 
cessive arrays (in the columns, say) will fit these mean values more 
exactly than will a straight line. Hence, should a truly curvilinear 
relationship be described by a straight line, the scatter or spread of 
the paired values about the regression line will be greater than the 
scatter about the better-fitting regression curve. The smaller the 
spread of the paired scores about the regression line or the regression 
curve which relates the variables X and Y (or Y and X), the higher 
the relationship between the two variables. For this reason, an r cal- 
culated from a correlation table in which the regression is curvilinear 
will always be less than the true relationship. An example will make 
this situation clearer. The correlation between the following two 


short series, as given by the product-moment formula, is r= .93 . 


[formula (24), p. 139]. The true correlation between the two series, 


Variable X Variable Y 


Qv P WN н 
m 
e 
о 


4.00 


however, is clearly perfect, since changes in Y are directly related to 
changes іп X. As X increases by 1 (ie., in arithmetic progression) 
Y doubles (i.e., increases in geometrie progression). The reason why 
r is less than 1.00 becomes obvious as soon as we plot the paired X 
and Y values. As shown in Figure 60, the relationship between X 
and Y is curvilinear, and is exactly described by a curve which 
passes through the successively plotted points. When linear relation- 
ship is forced upon these data, the plotted points do not fall along the 
straight line, and the product-moment coefficient, т, is less than 1.00. 
However, the correlation-ratio, or coefficient of non-linear relation- 
ship y (read as eta) for the given data is 1.00. 

А True non-linear relationship is encountered in psychophysics and 
in experiments dealing with fatigue, practice, forgetting, and learn- 
ing. Whenever an experiment is carried on to the point of diminish- 
ing returns, relationship will necessarily be curvilinear. Most mental 
and educational tests, however, when administered to large samples, 
exhibit linear or approximately linear relationships. The coefficient 


чу” 
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¥-variable 
ы 


[ 2 3 4 5 
X-variable 


FIG. 60 To illustrate non-linear relationship 


- of correlation, 7, therefore, has been employed in psychology and 
education to a far greater extent than has т; and for this reason the 
calculation of т is not given here.* If regression is significantly 
Non-linear, it makes considerable difference whether т or т is the 
measure of relation. But if the correlation is low and the regression 
Dot significantly curvilinear, 7 will give as adequate a measure of 
relationship as n. 

The coefficient of correlation has the advantage over 1 in that 
knowing r we can write down at once the straight-line regression 
equation connecting X and Y or Y and X. This is not possible with 
the correlation ratio. In order to estimate one variable from another 
(say, Y from X) when regression is non-linear, a curve must be fitted 
to the means of the Y-columns. The equation of this curve then 
Serves as a "regression equation" from which estimates can be made. 


* See references, page 453. 
f See Chapter 7. 
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1. 


[Note : The cancellation scores are in seconds; hence the two smallest scores 
numerically (1.е., 94) are highest and are ranked 1.5 each.] 
2. 


8. 


” 
PROBLEMS 
Compute the correlation between the following two series of test scores 
by the rank-difference method and test its significance. 
Cancellation Score 
Individual Intelligence Score uc o 
ing Test) 
1 185 110 
2 203 98 
3 188 118 » 
4 195 104 
5 176 112 
6 174 124 
n 158 119 
8 197 95 
9 176 94 
10 138 97 
1 126 110 
12 160 94 
13 151 126 | 
14 185 190 
15 185 118 


Check the product-moment correlations obtained in problems 6 and 7, 
pages 150-151, Chap. 6, by the rank-difference method. 


The following data give the distributions of scores on the Thorndike 
Intelligence Examination made by entering college freshmen who pre- 
sented 12 or more Tecommended units, and entering freshmen who pre- 


sented less than 12 recommended units. Compute bi-serial r by formula 
(87) and test its significance. 


$ 12 or more Less than 12 
Thorndike Scores recommended recommended 
units units 

90-99 6 0 
80-89 19 3 
70-79 31 5 
60-69 58 17 
50-59 40 30 
40-49 18 14 
30-39 9 tf 
20-29 5 4 
186 80 


ж. 
ғу 
к, 
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- The following data give the distributions of scores on a general educa- 


tional achievement test made by those who answered 50% or more, and 
those who answered less than 50% of the items in an arithmetic test 
correctly. Compute bi-serial 7 and test its significance. 


Subjects answering Subjects answering 
50% ог more of the less than 50% of the 


Achievement Test E с 5 š 
items on arithmetic items on arithmetic 


test correctly test correctly 

185-194 7 0 
175-184 16 0 
165-174 10 6 
155-164 35 15 
145-154 24 40 
135-144 15 26 
125-134 10 13 
115-124 3 5 
105-114 0 5 

120 110 


Compute tetrachoric r's for the following tables and test for signifi- 

cance. 

(1) Relation of alcoholism and health in 811 fathers and sons. Entries 
are expressed as proportions. 


Sons 
Unhealthy Healthy Totals 
2 Non-Alcoholic | 343 405 748 
Ez 
E Alcoholic 102 151 252 
Totals 445 556 1.000 
(2) Correspondence of Yes and No answers to two items of a neurotic 
inventory. 
Question 1 
= No Yes Totals 
Яя Yes 83 187 270 
3 No| 102 93 | 195 
ЕЗ „ЕЗ 
C Totals 185 280 465 


(a) Compute the ¢-coefficients for the two tables in example (10), 
p. 245. Test the significance of by method on p. 368. 


(b) Compute the Tanis for example (4), above. 
(c) Compute ф for the table in 5 (2) above. 


375 
d. 
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Calculate the coefficient of contingency, С, for the two tables given g 
below. 

(1) Marriage-Adjustment Score of Husbands 
Very Low Low High Very High Totals 
iis Graduate work 4 9 88 54 105 
БЕ College 20 31^ 85 99 205 
$$ High School 23 x 4 51 152 
S Grade School 11 10 п 19 51 
Totals 58 87 145 223 513 
(2) Kind of Music Preferred % 
English French German Italian Spanish Totals 1 
5 English 32 16 75 47 30 | 200 
GE French 10 67 42 4l 40 | 200 
82 German | 12 33 | 107 36 22 200 
Z Italian 16 20 44 76 44 200 
Spanish 8 53 30 43 66 200 
Totals 78 179 298 243 202 1000 
Convert the C's in example 7, above, to )25 and test for significance. Ы 
. Compute C for example 3, Chapter 6, page 149. | 
(a) In the following table, compute т by the product-moment method. 
(b) Plot the relationship between X and Y as shown in Figure 60, 
page 373. Is the relation linear? 
X F 
1 1 
| 
2 2 | 
3 4 , 
4 8 4^ 
5 16 7 
6 32 
ANSWERS 
о = .19; not significant 
Тув = 34 БЕ, = .07; significant at .01 level | 
Тыв = 47 SE, = 07; significant at .01 level 
(1) т; = —.09 not significant; SE,, = .06 (approximately) 


(2) т,= 33 SE,, = 07 (approximately). Significant at .01 level. 
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(Б) = 38 (с) ф = 22 
(1) С= 94 (2) С= 40 
( 
( 


1) Significant by у? test at .01 level 
2) Significant by y? test at .01 level 


к O72 
- (a) r= 85 (b) Relationship is non-linear 


PARTIAL AND MULTIPLE CORRELATION 


D 


+ 


1. The Meaning of Partial and Multiple Correlation 


Partial and multiple correlation represent an important extension 
of the theory and technique of simple or two-variable correlation to 
problems which involve three or more variables. In computing the 
correlation between two sets of scores, it is often desirable to allow 
for the influence of factors which through their common relationship 
to the variables being correlated obscure results or make them diffi- 
cult to interpret. To illustrate, suppose that the correlation between 
intelligence test scores and chronological age in a large group of chil- 
dren, seven to fourteen years old, is .50; that the correlation between 
school achievement and age in the same group is .40; and that the 
correlation between intelligence and school achievement is .70. Since 
intelligence test scores and school achievement both increase with 
age (the correlations are .50 and .40) the correlation between these 
two measures will be raised when age is allowed to vary. The corre- 
lation coefficient of .70, therefore, is not only a measure of the role of 
intelligence in school achievement, but is а measure of the influence 
of intelligence plus the indirect effects of differences іп age or matur- 
ity upon school achievement. 

То discover the relationship between intelligence and school 
achievement, uninfluenced by maturity, we must rule out or control 
the factor of age. This could be accomplished experimentally by 
selecting children all of whom are of the same age. But this proce- 
dure offers many difficulties, the principal one being that it is well- 
nigh impossible to find a large sample of children of exactly the same 
age. It becomes necessary, then, to determine what age range is per- 
missible; and the more closely we limit our group with respect to age: 


379 
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7% the smaller the number left. In fact, the experimental control of & 
variable by the method of selection may so limit the size of the group 
that correlations are of doubtful value. 

Because of the difficulties which arise in attempting to control ‘a 
variable (or variables) experimentally, the method of partial correla- 
tion is often employed. By this method the relationship between two 
variables can be determined when one or more related variables are 

| held constant. Thus, the partial correlation between general intelli- 
gence and school achievement, i.e., the correlation with age “рат- 
tialled out,” gives us the correlation between these two variables un- 
influenced by the factor of age differences. Such a partial coefficient 
represents the net correlation between general intelligence and school 
achievement for children of the same age; or the net correlation 
between intelligence and school achievement when age is а constant 
| factor. Expressed in still another way, our partial coefficient tells us 
what relationship exists between general intelligence test scores and 
school achievement when differences in maturity no longer affect 
either variable. 
| A second illustration of partial correlation may be helpful. A 
teacher finds in her class a correlation of .60 between test scores in 
| history and arithmetic. In looking for an explanation of this correla- 


tion (since there is apparently little reason to expect a high relation- 
ship between these two abilities), she finds that achievement in arith- 
metic seems to depend in part upon ability to read and understand the 
Problems. Obviously, ability to read well is also an important factor 
in determining achievement in history. Suppose that our teacher 
calculates the correlations of history and arithmetic with a third 
test, namely, one of reading comprehension. Knowing these 7’s, she 
> May determine (by methods given on p. 387) the net or partial corre- 
*.. lation between history and arithmetic when differences in reading 
comprehension have been allowed for. If this partial coefficient is 
:30, say—considerably smaller than the “whole” coefficient (of .60) 
between history and arithmetic—the hypothesis that the apparent 
relationship was due in part to the common dependence of both tests 
Upon reading is verified. When a factor (or factors) is “partialled 
out” from a given correlation the effect is to eliminate the differences 
among individuals introduced by the variable thus controlled. The 
method of eliminating factor variability through partial correlation 
may be employed whenever the correlation can be computed between 
the factor or factors to be controlled and the two variables the net 
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correlation of which we are seeking. Since all of the data are utilized, 
partial correlation has a decided advantage over experimental con- 
trol in many problems. 

In addition to its value as a means of controlling conditions by 
eliminating the effects of “disturbing” or other variables, partial cor- 
relation is useful in other ways. It enables us, for example, to build 
up a regression equation involving three or more variables from which 
a “criterion” score may be predicted when we know the scores made 
by a subject on several correlated tests. The accuracy of the regres- 
sion equation in estimating criterion scores—its reliability as a “рге- 
diction” instrument—can be determined by the coefficient of multi- 
ple correlation. A multiple correlation coefficient gives the correla- 
tion between a single test or criterion on the one hand and a team of 
tests on the other. The meaning of the multiple coefficient of correla- 
tion will be better understood when the student has worked through 
an actual problem such as that given in Table 52. 


Il. An Illustrative Correlation Problem Involving 


Three Variables 


Perhaps the most straightforward approach to an understanding 
of the meaning of partial and multiple correlation, and of the tech- 
niques of calculation involved, is through the solution of a problem. 
The present section, therefore, will show the application of partial 
and multiple correlation to a three-variable problem. Following this, 
the general formulas and further applications of the method will be 
considered. 

The problem in Table 52 is taken from a study * of the factors 
which influence “academic success.” In that part of the study from 
which the present data are drawn, the problem was to discover how 
accurately one can predict the academic success of freshmen from 8 
knowledge of their general intelligence and of their study habits. 
Academic success was defined specifically as the number of credit or 
“honor” points obtained by a student at the end of his first semester 
in college. The number of honor points earned depended upon the 
number of A, B, and C grades made by the student in his freshman 
courses. A grade of A carried three honor points; a grade of В two 
honor points; a grade of C one honor point; and a grade of D, which 


* May, М. A, “Predicting Academic Success,” Journal of Educational Psy- 
chology, 1923, 14, 429-440. ў 


Ж 
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TABLE 52 A correlation problem involving three variables 


(To illustrate partial and multiple correlation) 


Step 1. Primary Data (N = 450) 
(2) General Intelli- (3) Average Hours 


(1) Honor Points gence of Study per Week 
M; = 18.5 М: = 100.6 Мз = 24 
оу = 11.2 сз = 15.8 o= 6 
то = .60 ns = .32 Te = — .85 
Step 2. Calculation of Partial Coefficients of Correlation 
24 Tie — Тізе _ 60 — .32(— 85) _ 96) 
ma кет ТЕСТ EX Т ОО ( 
Tis — Тиз .32 — .60(— .35) 
„= = =.71 96 
та = Уур ra 8000 X .9867 р Cag? 
Тэз — Тїз - .85)- 60 X.32 _ _ 79 (96) 


тал = avo 8000 X .9474 
Step 3. The Regression Equations and Partial Regression Coefficients 


Тұ = Unam + ыз (Deviation Form) (98) 
9r X; 3Х: + Ьз2Х: + К (Score Form) (99) 
е А 01. 
in which Баз = Таз a and bys.2 = Tagen (102) 


Step 4. Calculation of the Partial 075 
(1) соз = o; Vl — rn Vl — ris = 11.2 x .8000 x .7042 = 6.3 (97) 
(2) озлз = 03; Vl — r3 Vl — rins = 15.8 X .9367 X .6000 = 8.9 (97) 
(8) оз = o3 V1 — rn V1 — Pus = 6X 19367 X .7042 = 4.0 (97) 
Step 5. Calculation of the Partial Regression Coefficients, and Partial 
Regression Equation 
Substituting for тз, 719.2) 01.22, 0233) 031 
bia = -80 X 52 = .57; bia = -71 X 20 = 1.12 


we have 


Hence the regression equation becomes: 
т, = .57ж: + 1.1213 (Deviation Form) 


or Y, = .57Х» + 112Х; — 66 (Score Form) 


Step 6. Calculation of the Standard Error of Estimate 
Crest. x) = 01.5 = 6.3 (105) 


Step 7. Calculation of the Coefficient of Multiple Correlation 


7 
Rien = 4 di == (107) 


= .83 


was a passing mark, carried no honor point credit. The maximum 
number of points which a freshman taking the regulation number 
of courses in one semester could obtain was forty-eight. 


General intelligence was measured by a combination of the Miller 
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Mental Ability Test and the Dartmouth Completion of Definitions 
Test. The first test contains 120 items and the second 40, so that the 
maximum score was 160. The scores of the 450 students in this sam- 

ple ranged from 50 to 150, the distribution being fairly normal. As 

2 measure of interest and application it was decided to take the 
average number of hours per week spent in study. Information with 
regard to study habits was obtained by means of a questionnaire 
given at the beginning and again at the middle of the first semester. 
Among other items in the questionnaire upon which information was 
requested were the number of hours spent per week at meals, in p: 
sleeping, etc. These and other questions were included in order that — 7 
the student might think that he was being checked upon the distribu- 

tion of his total time and not upon his study habits alone. The cor- 
relation between the student's estimates of the number of hours spent 

in study, given on the first and second questionnaires, was .86, indi- 
cating a satisfactory degree of reliability. 

As stated above, the main object of this study was to find how 
accurately the number of honor points which a student earns can be 
predicted from a knowledge of his study habits and his general intel- 
ligence. Other factors, of course, such as health, personality, previ- 

Qus preparation, and the like, are undoubtedly of importance in de- 
termining the number of honor points received. The two factors 
selected were chosen because they are important and are also objec- 
tive and measurable. As the first step in solving our problem, we 
shall calculate the partial coefficient which shows to what extent 
honor points are related to general intelligence when the variable 
factor of study hours per week is held constant. Next the partial 


Coefficient will be calculated which shows to what extent honor points | 
аге related to study hours when the variable effect of general intelli- = 
gence 1 rendered constant. Apart from the employment of these b 
partial coefficients in the regression equation from which we predict | 
honor points, the information which they yield will prove in itself to 

be of considerable interest. The solution of the problem is outlined in 


the following series of Steps; the necessary data and caleulations will 
be found in Table 52. 


Step 1 
The mean and c of each series of measures and the intercorrela- 


tions are first calculated. "These intercorrelations are product- 
moment r’s computed as shown in Chapter 6. The correlation be- 
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tween (1) honor points and (2) general intelligence, written ris, is 
.60; the correlation between (1) honor points and (3) the number of 
hours per week spent on the average in study, written ris, is .32; and 
the correlation between (2) general intelligence and (3) hours of 
study per week, written res, is —.35. The low correlation between 
honor points and study hours is of decided interest; but the most 
surprising correlation is the —.35 between study hours and general 
intelligence. Evidently the brighter the student, the less he 
studies. 


Step 2 


Having found the intercorrelations of our three variables, we may 
then calculate the net correlation between (1) honor points and (2) 
general intelligence with the influence of (3) study hours partialled 
out or held constant. This net or partial coefficient of correlation, 
written туз з, is found from the following formula: 


ee e uat rdi, (96), page 388 
бы Ү1-тзУ1-та i 


Substitution of the values for тз, туз, and rss in the formula gives а 
partial coefficient, 74.2 of .80. This means that if all of our 450 stu- 
dents had studied exactly the same number of hours per week, the 
coefficient of correlation between honor points earned and general 
intelligence test scores would have been .80 instead of .60. In other 
Words, when all students spend the same mumber of hours in study, 
there is a closer correspondence between general intelligenee test 
score and honor points earned than there is when the number of 
study hours varies. 

The partial coefficient of correlation between (1) honor points and 
(3) hours spent in study per week with (2) general intelligence par- 
tialled out, or its influence held constant, is found from the formula 


in poate етінін Tm (96) 
Substitution of the values for тїз, 71s, and тоз gives a partial coeffi- 
cient, 743.2, of .71, as against an obtained coefficient (гіз) of 32. This 
result means that if our group possessed the same general intelli- 
gence * there would be a much closer correspondence between the 


2 * By “same general intelligence” is meant the same score on the given general 
intelligence tests. 
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number of honor points received and the number of hours spent in 
study than there is when the members of the group possess varying 
degrees of intelligence. This is certainly the result to be expected. 

The last partial coefficient of correlation res, equals —.72. This 
coefficient gives the net correlation between (2) general intelligence 
and (3) study hours when the influence of (1) honor points is held 
constant. It is found from the formula 


Тоз — Ті2713 (96) 


Tog] = ————————— 
25: У1-т”% У1 – тз 


Like the two partial 778 above, we may interpret тоз to mean that 
the correlation between general intelligence and hours spent in study 
in a group in which every student earns the same number of honor 
points would be much higher (in the inverse direction) than the 
“там” correlation between the same two factors in an unselected 
group. By an unselected group is meant here a group in which the 
number of honor points received by different students varies. It 
seems evident that the brighter student not only studies less than the 
average and dull (since тоз = —.35) but that the brighter the student, 
the less he needs to study in order to reach a given standard of aca- 
demic success—earn a given number of honor points. 


Step 3 


Knowing the partial coefficients of correlation, we may write the 
multiple regression equation from which the most probable number of 
honor points a student will receive may be estimated when we know 
his score in the general intelligence test and the number of hours he 


studies per week. The regression equation for three variables (in 
deviation form) is as follows: 


Tı = 1531 Біз. (98), раре 391 


In this equation 7, stands for honor points and is the dependent 
variable or criterion; т and ту stand for general intelligence and 
study hours, respectively, and are the independent variables. Note 
the resemblance of this equation to the simple regression equation for 
two variables у = bys X x (p. 155). If T, is put for y, and д» for х in 
the two-variable equation, we have т, = bio X 22. 

When written in score form, the multiple regression equation for 
three variables becomes 


cae 


— 
АҒ С 
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(Х\— М,) = biss(Xo — Мз) + Әзг(Ха- М») 


or transposing and collecting terms, 
X, = Ь3Х + з2Х + К (a constant) (99), page 391 


It is clear that before we can use this equation we must find the 
value of the partial regression coefficients бізз and bis.2. These may 
be found from the formulas 

bus = паа Co and bis = таз жы (102), page 392 
and, as we already have the'values of 745,3 and 713.2, it is only neces- 
sary that we find 01.23, 02.13, and 03.12 (the partial o’s) in order to 
replace the partial regression coefficients in the equation by numeri- 
cal values. 

Note that the partial coefficient of correlation 123.1, although of 
interest as giving us the relation between general intelligence and 
hours spent in study for a constant number of honor points earned, is 
not actually needed in the regression equation Tı = Біз» + 13.573. 
In order to evaluate the constants b12.3 and Әізгіп our regression 
equation, we need only 712.3 and тізе. In fact, іп any problem in- 
volving three variables, only two partial coefficients of correlation 
need be computed, if we are interested primarily in the prediction of 
Xi, scores from known values of Хо and Хз. 


Step 4 
The partial o’s may be found from the formulas 
тыз = 01V 1 — Ti М 1- "ал 
08543 = 02.41 = сМ 1 — r5 У 1- Тоз (98), page 391 
7312 = 03.31 = ОзУ 1-ішУ1- rus 
Substituting the known values of the raw and partial r’s in these for- 


mulas we find that 01.23 = 6.3; 0213 = 8.9; and 634» = 4.0. (For the 
calculations see Table 52.) 


Step 5 


From the partial o's and the partial 775 the numerical values of 
the partial regression coefficients бөз and Б.з,» are found to be .57 
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and 1.12, respectively. We may now write the multiple regression 
equation in deviation form as 


ту = 5725 + 1.1223 


In order to write this multiple regression equation in score form 
we replace zı by (X; —18.5); т by (Xs — 100.6); and аз by 
(Xs — 24). Тһе equation then becomes 


Na 19x. — 66 


Given a student's general intelligence test score (Хо) and the num- 
ber of hours per week he spends in study (Хз), we can estimate from 
this equation the “most probable" number of honor points he will 
receive during his first semester in college. Suppose that student 
J. N. has a general intelligence test score of 190 and that he studies 
on the average 20 hours per week: how many honor points will he 
then most probably receive during the first semester? Substi- 


tuting X» = 120 and X; = 20 in the regression equation, we find 
that 


Xi- (57 X 120) + (1.12 X 20) — 66 = 25 


The most probable number of honor points which student J. N. will 


receive, therefore, using the given measures as the basis of our fore- 
cast, is 25, 


Step 6 


This forecast, like every other “most probable" number of honor 

points predicted from the regression equation, has an "error of esti- 
mate" The standard 1 error of estimate of any X4 predicted from the 
regression equation, Ху = bi, 3X3 + bi3.2X3-+ К is written Gest. ху)» 
and equals c; 23 directly (p. 381). 
k The standard error of estimate in the present problem is 6.3, and 
in the illustration given above, the twenty-five honor points esti- 
mated for J. N. have a SE (est. хуу of about 6 points. This means 
that the chances are about two in three that our forecast of twenty- 
five honor points will not miss the actual number of honor points 
received by J. N. by more than +6, In general we may say that two- 
thirds of all predicted honor point values will lie within +6 points 
of their actual values. 


À 
1 


» 


| 
| 


ya 
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Step 7 


The final step in the solution of our three-variable correlation 
problem is the computation of the coefficient of multiple correlation. 
“Multiple r,” generally written R, is defined (see p. 380) as the co- 
efficient of correlation between scores actually made on the criterion 
test and scores on the same test predicted from the regression equa- 
tion. For the data of Table 52, R gives the correlation between earned 
honor points (X;) and honor points estimated by means of ihe two 
variables, general intelligence (Хә) and hours of study (X3), when 
these two are combined into a team by means of the regression equa- 
tion. The formula for R when we are dealing with three variables is 


RC AI - 5 (106), page 395 
In the present problem Riis) = -83. This means that if the most 
probable number of honor points which each student in our group of 
450 will receive is predicted from the regression equation given on 
page 381, the correlation between these 450 predicted scores and the 
450 scores actually received will be 83. Multiple R tells us to what 
extent X is determined by the combined action of X» and Хз; or, in 
the present instance, to what extent honor points are related to gen- 
eral intelligence together with number of study hours per week. | 

Тһе methods described іп this section are not practicable when 
there are more than four variables. For multiple correlation prob- 
lems involving a large number of tests it is advisable to use short-cut 
methods to lessen the amount of numerical calculation. An efficient 
and timesaving method is described in Chapter 7 and Appendix A 
of R. L. Thorndike’s Personnel Selection (New York: John Wiley 


and Son, 1949) .* 


Ill. General Formulas for Use in Partial 
and Multiple Correlation 


1. Partial r's of any order 


(1) FORMULAS FOR PARTIAL 7/8 
We found in Table 52 that one is able by the method of partial 
* See also Capter 16. 
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correlation to find the net relationship between two variables when 
the influence of a third is ruled out or held constant. By an exten- 
sion of the partial correlation method, we may obtain the net cor- 
relation between X; and X» when two or more variables have been 
held constant. The partial coefficient of correlation T12.34, for exam- 
ple, means by analogy to 712.3 that the correlation between X, and X» 
has been freed of the influence of both Хз and Ха; and the partial 
coefficient of correlation T1234...» Means that the correlation be- 
tween X, and X; has been freed of the influence of a large number of 
disturbing factors. 

In every partial coefficient of correlation, e.g., ri2,34, the primary 
subscripts to the left of the point (1 and 2) define the two variables 
whose net correlation we are seeking. The secondary subscripts to 
the right of the point (3 and 4) denote the variables ruled out or held 
constant. Тһе order in which the secondary subscripts are written is 
immaterial, i.e., туза = ri» 44. The order of the primary subscripts is 
of importance, however, as it tells us which variable is taken to be 
dependent and which independent. The Тіз means that X, is de- 
pendent—is to be predicted from Хо; while т) means that Хо is 
dependent—is to be predicted from X;. The numerical values ri» 
and rə; are, of course, the same. Тһе order of a partial r is deter- 
mined by the number of its secondary subscripts. Thus riz, АП 
"entire" or “total” r, is a coefficient of zero order; i33 is a partial 7 
of the first order; туз s45 is a coefficient of the third order. 

The general formula for a partial r is 


т 1) — =; 
Там. a= 12.4... (n) — Тіп... (n—072n.34 . . . (n—D (96) 


арра ирли 


(partial correlation Coefficient in terms of the coefficients 
of lower order —n variables) 


From this formula partial r’s of any given order may be found. In a 


five-variable problem, for example, (n — 1) — 4, and n — 5, so that 
712.345 18 Written 


ТМ — 715.47: 
Т5 = 12.94 15.34725.34 


Vl— ra У — т и 
that is, in terms of the partial 7% of the second order. These second 
order partial r's must then be computed by formula (96) from 778 of 
the first order before the third order T, T12.345, can be evaluated. In 
calculating partial r’s Table I may be used to read VI — r? values. 


ТЕ 


> 


BN Жу 
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There are several methods akin to partial correlation which are 
useful in certain special problems. Two of these, paré correlation and 
semi-partial correlation, may be mentioned briefly. These proce- 
dures differ from partial correlation in that they give the net effect 
secured by ruling out the influence of one or more variables from only 
one of the two correlated measures, instead of from both. For exam- 
ple, one may wish to know the relation (semi-partial) between re- 
action time and speed of reading when differences in size of vocabu- 
lary are held constant with respect to reading only. Part correlation 
and semi-partial correlation have not been widely used in mental 
measurement. For a discussion of formulas and for illustrations see 


references below.* 


(2) SIGNIFICANCE OF A PARTIAL Т 

The significance of a partial r (like that of a zero-order r) may be 
tested against the null hypothesis. We may use either Table 25, 
page 200, or Table J, column headed 2 variables. The degrees of free- 
dom for a partial r are (N — m) where N — number of cases and 
т = number of variables entering into the partial т. Thus if 
"19.345 = 40 and N = 75, m = 5 and (М — т) = 75 — 5 or 70. The 
05 and .01 significance levels for this т are 28 and .30. 

In Table 52, тоз = 80, N = 450, m = 3, and (N — m) - 447. 
From Table J, column 2, the r entries by interpolation for N — 447 
are .09 and .12 at the .05 and .01 levels. The probability that the ob- 
tained тоз of .80 arose from fluctuations of sampling is much less 
than .01; and this is true, also, of 713.2 of .71 and resi of —.72. АП 
three partial r’s, in fact, are highly significant. 


2. Partial o's of any order 


(1) GENERAL FORMULAS 

Just as the correlation between two sets of scores can be deter- 
mined when the influence of 1,2,3 . . . n factors is held constant, so 
the variability (в) of a set of scores can be computed when the influ- 
ence of 1,2,3...7 variables is ruled out. As an illustration, con- 
sider боҙ of Table 52. This partial c gives the variability of X4 
(honor points) freed of the influence upon variability exerted by the 


* Ezekiel, M., Methods of Correlation Analysis (2nd ed.; New York: John 

iley and Sons, 1941), p. 213. | 

Dunlap, J. W., and Cureton, E. E., “On the Analysis of Causation,” Journal 
st Educational Psychology, 1930, 21, 657-680. 
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two factors X» (general intelligence) and Хз (study hours per week). 


Тһе general formula for partial o's of any order is 


713... = 01 V1 — ry УТ 53,5; М1 — T.23 (97) 
ics VE m 


* (п-) 


(partial с for n variables) 


This formula may be used to compute the net o’s in correlation prob- 


lems which involve any number of variables. In a five-variable prob- 
lem, for example, 01.2345 18 written 


Tix = 01V 1 — ry УТ raa VI — Tu. V 1 — тран 


This partial o is of the fourt 
scripts, and the order of a 
determined by the number 

By a simple rearrangeme 
order с may be written in 
second order may be writte 
given on page 385 as 


1 9135 = 01V 1— ry V1 — pu, 
may also be written 


01.32 = 33V 1 — rh 4/1 — 79.3 


In like manner 7233 may be written 


h order since it has four secondary sub- 
partial o, like the order of a partial т, is 
of its secondary subscripts. 

nt of the secondary subscripts, any higher 
more than one way. A partial c of the 
n in two ways: for example, 61,23 which is 


(1) 92.13 = OV 1 — т, VI — Ті 
ог 

(2) Оп = 02V 1 т N/1— Ta 
and оз. may be written 

(1) 03342 = 03V 1 — Ts У1- T5551 
or 

(2) 9521 = 03V 1 — rhy V1 — 72, 
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are concerned. Furthermore, if r23.1 is not wanted for other purposes, 
it need not be calculated at all (see р. 381). Two partial 778 are all 
that are required in order to write the regression equation of a three- 
variable problem. 


3. Multiple regression equations and partial regression coefficients 


(1) THE MULTIPLE REGRESSION EQUATION FOR ANY NUMBER OF VARI- 


ABLES 
The regression equation which expresses the relationship between 
a single dependent or criterion variable, X, and any number of inde- 
pendent variables, Xs, Xs, Ха... X, may be written in deviation 
form as follows: 


Ti = и... ata Uus. ns Med Dinas... өс » (98) 


(regression equation, deviation form, for n variables) ` 


and in score form | 
Xi = bra... „Хз + һзм... XE nt + Din... - ta- Xn TAN 


(regression equation, score form, forn variables) 

The partial regression coefficient 012.34... bis.c4... m 640. give the 
weights to be attached to the scores of each independent variable 
when X, is to be estimated from all of these in combination. Further- 
more, the regression coefficients give the weight which each variable 
exerts in determining X1 when the influence of the other variables is 
excluded. Hence, we сап tell from the regression equation just what 
role each of the several test variables plays in determining the score 


on Test 1, the test taken as the criterion. 


(2) THE MULTIPLE REGRESSION EQUATION FOR THREE VARIABLES (SPE- 


CIAL FORM) 
When a problem in 
tion, as we have seen, is written 
ту = 012.372 + 613.213 (deviation form) 
partial o’s are of no special interest, it is 
uation above in а somewhat more con- 
as follows: 


volves only three variables, the regression equa- 


If the partial 778 and the 
possible to express the eq 
venient form for calculation, 
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z, = 91019 — таға) 1(713 — Tiros) 100) 
ы) "T gg.) "s : 


(regression. equation for three variables, special form) 


or in score form 


X, — 91019 — таға) 910713 — тізе) . (101) 
Herd — i) ae (lor) ХК. ( 


(regression equation for three variables, special form) 


As this equation involves only zero order r's and zero order o's, 
Х may be estimated from it without the computation of any partial 
7's or partial o’s. We may illustrate using the data given in Table 52, 
page 381. Substituting for о, = 112, оз = 15.8, оз = 6, тіз = .60, 
тїз = .32, and тоз = —.35, we have 


oA 11.2 (.60 + .32 X .35) 11.2 (.32 + .60 X .35) 

Tı = = Zo ^ 
15.8(1— 85%) 6(1 — .35?) 

Tı = 57t 1.127; 


which checks the regression equation as calculated in Table 52. 


Та 


(8) PARTIAL REGRESSION COEFFICIENTS (5%) 


Partial regression coefficients may be computed from the formula 


с. 
bios, = тол... ett (102) 
02334... n 


(partial regression coefficients in terms of partial coefficients of 
correlation and standàrd errors of estimate—n variables) 
When the problem involves three variables, the regression coefficients, 
bis and bizo are, like т; з and 713.2, ОҒ the first order. The first re- 

gression coefficient, бізз, equals тз , 9123 


——and the second regression 
92.13 


coefficient, bis », equals 7,4, 9123. 


Д " . 08.12 1 
Partial regression coefficients which involve more than three vari- 
ables may be calculated from formula (102). In a five-variable 


problem, for example, the regression coefficients (of the third order) 
are 


T 
бом = там 25 4... 
02.1345 
01.2345 
bis = 713.215 —À 5 eto 
03.1245 
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In order to find these partial regression coefficients we first compute 
the third order partial r’s and the fourth order partial o's. 

The b’s are determined by the o’s of the tests and these in turn 
depend upon the units in terms of which the test is scored. The 
b-coefficients give the weights of scores in the independent variables, 
Xə, Хз, etc., but not the contribution of these variables without 
regard to the scoring system employed. ‘The latter contribution is 
given by the “beta weights,” described in (4) below. 


(4) THE BETA (В) COEFFICIENTS 


When expressed in terms of o-scores, partial regression coefficients 
are usually called beta coefficients. The beta coefficients may be cal- 


culated directly from the b's as follows: 
0з (103) 


бзә.з4...» = bios... "oy 


(beta coefficients calculated from partial regression coefficients) 


The multiple regression equation for п variables may also be writ- 


ten in o-scores as 
21 = Bross... 22 + Біз... п23 see Bin.23". - 


ssion equation in terms о) o-scores) 


e E (104) 


(multiple regre 
Beta coefficients are often called “beta weights” to distinguish them 
from the “score weights” (b’s) of the ordinary multiple regres- 
sion equation. When all of our tests have been expressed in 
o-scores (all Means = .00 and all o’s = 1.00) differences in test units 
as well as differences in variability are allowed for. We are then able 
to determine from the correlations alone the relative weight with 
which each independent variable “enters in 
criterion, independently of the other factors. 
To illustrate with the data in Table 


= TX 155 or 81 and that Bis.2 = 112 PA 


» or contributes to the 


52, we find that piss 
P 60. From (104) 


above we get 
т, = .8122 + .602з 


This equation should be compared with the multiple regression 
equation ту = -5722 + 119 in Table 52 which gives the weights to 
be attached to the scores in Хо and Хз. The weights of .57 and 1.12 
tell us the amount by which scores in X» and Хз must be multiplied 
in order to give the “pest” prediction of Ху. But these weights do not 
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give us the relative importance of general intelligence and study 
habits in determining the number of honor points a freshman will 
receive. This information is given by the beta weights. It is of inter- 
est to note that while the actual score weights are as 1:2 (.57 to 1.12) 
the independent contributions of general intelligence (zə) and study 
habits (2з) are in the ratio of .81 to .60 or as 4:3. When the variabili- 


general intelligence has a proportionately greater influence than 
study habits in determining academic achievement. This is certainly 
the result to be expected. 


4. The standard error of estimate for multiple regression equations 


АП X; scores estimated fr 
standard error of estimate 
Scores given by the regression equation instead 


directly by o; ъз; ...n ав follows 


бек X1) = 01554... n (105) 
(standard error of estimate for n variables) 


Sinis, a must be computed in order to evaluate the partial 
9 (est. ху) is always calculated in the 
€ 52, the ores, хуу of a prediction of 


ез are about seven іп ten or two in 
three that the “most probable” honor point score forecast for any 


5 or less. 

urther into the meaning of Gest. ху): 
equals c; әҙ; and the latter indicates 
the variability of Test 1 (honor points) obtained by 


Sod Pus ni) the influence of Tests 2 and 3 
(general intelligence and Study effort), 


і = .57Х»-Е112Х5 — 66 
(see p. 381), X, (honor Points) сап be predicted with a smaller error 


т linear equation, Put differently, the 
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standard error of estimate is a minimum when the regression equa- 
tion is used to estimate X; scores.* Hence, the values of X, pre- 
dicted from the multiple regression equation are the “best estimates" 
of the actual X, values which can be made from a linear equation 
containing the given variables. 


5. The coefficient of multiple correlation, R 


(1) GENERAL FORMULAS 

Тһе correlation between a single dependent or criterion variable 
X, and (n — 1) independent variables combined by means of a mul- 
tiple regression equation is given by the formula 


2, 
Мето (106) 


(multiple correlation coefficient in terms of partial 
o’s — n variables) 


in which 
Б, ов... n = the coefficient of multiple correlation 
бі = the standard deviation of the criterion (X1) scores 
61.03...» = the variability left in Test 1 when the variability of 
Tests 2, 3... nis held constant through partial 
correlation. 


When there are only three variables, the multiple coefficient of var- 


relation becomes 


071.23 
em a 
Ries) A 
when there are five variables 
073.9315 
Rigs) - i AOL" 


If we replace бәз...» M formula (106) by its value in terms of 
the entire and partial 7’s [see formula (97)] we may write the gen- 
eral formula for 1234... n) 88 follows: 

Riess... n = A1-E - 732) QETE CEREN (һ—1))_] 

(107) 

(multiple coefficient of correlation in terms of partial coefficients 

of correlation—n variables) 


.* Yule, G. U., and Kendall, M. G., An Introduction to the Theory of Statis- 
tics, op. cit., pp. 262-267. t 
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Since a higher order с may be written in a variety of ways, the num- 
ber depending upon its order (see p. 389), there are several alternate 
forms for R. These serve as valuable means of checking the accuracy 
of our arithmetical calculations. In a three-variable problem, for 
example, Р; (гз) may be written as 


Ries = V1 — [(1— r) (1— 7735,2) | 


Riise) = VI — [1 — r53) (1 — 7523)] 

The standard error of estimate is a minimum when the multiple 
regression equation is employed in estimating X, scores (p. 395). 
Hence the multiple coefficient of correlation, R, is the maximum cor- 
relation obtainable between actual X, scores and X 1 scores estimated 
from a knowledge of the variables Xo, Ха... X, in the regression 
equation. The truth of this statement is contingent upon linearity of 
regression in all of the correlations. R indicates how accurately a 
given combination of variables represents the actual values of Xi 


(the criterion) when our test scores are combined in accordance with 
the "best" linear equation. 


or as 


(2) MULTIPLE R IN TERMS оғ В COEFFICIENTS 


R? тау be expressed in terms of the beta coefficients and the zero 
order r's: 


Ries. on) = бшм. curi Ваз. arit- -+ Bin.23 . . (а=) Tin 


108) 
(multiple R? in terms оў B coefficients and zero order r's) 
For three variables (108) becomes 
Pies) = Baste + Bisaris 


From page 393 we find Бігз = .81 and [з = .60; and from Table 
52 that ғаз = .60 and тв = .32. Substituting in (108) above, we get 


T?, (23) = 81 X 60+ .60 X 32 
= 49+ 19 

FR? 1.23) = 68 

Г (эз) = .82 


Вз...) gives the proportion of the variance of the criterion 

' measure (Xj) attributable to the joint action of the variables Хз, 
Ха... Xn. As shown above, R2, (55, = .68; and, accordingly, 6896 of 
whatever makes freshmen differ in (1) school achievement can be 
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attributed to differences in (2) general intelligence and (3) study 
habits. By means of formula (108) the total contribution of .68 can 
be broken down further into the independent contributions of general 
intelligence (Хә) and study habits (Хз). Thus from the equation 
10; озу = 49 + .19, we know that 49% is the contribution of general 
intelligence to the variance of honor points, and 19% is the contribu- 
tion of study habits. The remaining 32% of the variance of X; must 
be attributed to factors not measured in our problem. 


(3) THE SIGNIFICANCE OF Ё 

Multiple R is positive,* always less than 1.00, and always greater 
than the correlation coefficients 712; 718)... Tine The significance of an 
В can best be tested, perhaps, against the null hypothesis by means 
of Table J. This table must be entered with the number of variables 
(m) in the problem and with (N — m) degrees of freedom. To illus- 
trate with Table 52, R = .83, N = 450, m=3 and (N — т) —450—3 
or447. From the column headed “3” in Table J we read that for 447 
degrees of freedom the R’s at the .05 and .01 levels (by interpolation) 
are .116 and .143. Only once in twenty trials would an В of .116 arise 
by sampling fluctuations on the null hypothesis, and only once in 100 
trials would ап R of .143 occur. As our R is very much larger than 
114, it is highly significant. Table J may be used with problems in- 
volving up to nine variables. Suppose that Risas) = 526 and 
М = 40. From the column headed “5 variables” in Table J, we find 
that for 40 — 5 or 35 degrees of freedom, the рв are .482 and 556 at 
the .05 and .01 levels. The obtained R is significant, therefore, at 
the .05, but not at the .01, level. 


selection of tests in а battery 


6. Factors determining the 
The effectiveness with which the composite score obtained from a 


battery of tests measures the criterion depends (1) upon the inter- 
correlations of the tests in the battery as well as (2) upon-the corre- 
lations of these tests with the criterion—their validity coefficients. 
This appears clearly in Table 53 in which the criterion correlation of 
each test is .30, but the intercorrelations of the tests of the battery 
vary from .00 to .60. When the tests are uncorrelated (all criterion 
T's being .30), ап increase in size of the battery from 1 to 9 tests 
raises multiple В from .30 to .90. However, when the intercorrela- 


* Since I is always taken as positive, chance errors are cumulative and may 
be large if the sample is small and the number of variables large. For the cor- 
rection of R for chance errors, see formula (109), page 407. 
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tions of the tests are all .60 and the battery is increased in size from 
1 to 9 tests, multiple В goes from .30 to 37. Even when the number 
of tests in the battery is 20 multiple E is only .38. 

TABLE 5 


3% Effect of intercorrelations on multiple correlation 


Number of Tests Size of Intercorrelations 

00 10 30 60 

pe Ru S me. ee РН 

1 30 30 30 30 

2 42 40 37 34 

4 60 53 44 86 

9 90 67 48 37 

20 T 79 52 38 


* From В. L. Thorndike, Personnel Selection (New York: John Wiley and 
Sons, 1949), p. 191. 


Т It is mathematically impossible for 20 tests all to correlate 0.30 with some 
measure and still have zero intercorrelations, 


СР 


©. 
a 
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A test may also add to the validity of a battery by acting as a 
“suppressor” variable. Suppose that Test A correlates .50 with a 
criterion—has good validity—while Test B correlates only .10 with 
the criterion but .60 with Test A. The Ries) = .56 despite the low 
validity of Test B. This is because Test B acts as a suppressor— 
takes out some of Test A’s “non-valid” variance, thus raising the 
criterion correlation of the battery.* (See Fig. 62.) The weights of 
these two tests in the regression equation connecting the criterion 
with A and B are 69 and —.31. The negative weight of Test B serves 
to suppress that part of Test A not related to the criterion and thus 
gives a better (more valid) measure of the criterion than can be 


obtained with Test A, alone. 


FIG. 62 


IV. Spurious Correlation 


Тһе correlation between two sets of test scores is said to be spuri- 
ous when it is due in some part, at least, to factors other than those 
which determine performance in the tests themselves. In general, the 
cause of spurious correlation lies in a failure to control conditions; 
and the most usual effect of this lack of control is a “boosting” or 
inflation of the coefficient. Some of the situations which may lead to 


spurious correlation will be given in this section. 


1. Spurious correlation arising from heterogeneity 


We have shown elsewhere (p. 378) how a lack of uniformity in age 
conditions will lead to correlations which are spuriously high. Fail- 
ure to take account of heterogeneity introduced by the age factor is 
a prolific source of error in correlational work. To cite an example, 
within a group of boys ten to eighteen years old, a substantial cor- 

* See also Table 52. Here Tests 2 and 3 take out relatively distinct parts of 


1 (the eriterion)—they are negatively correlated—so that Ries (83) is sig- 
nificantly increased over т (60). 


400 • STATISTICS IN PSYCHOLOGY AND EDUCATION 


relation will appear between strength of grip and memory span, quite 
apart from any intrinsic relationship, due solely to the fact that both 
variables increase with age. In stating the correlation between two 
tests, or the reliability coefficient of a test, one should always be care- 
ful to specify the range of ages, grades included, and other data 
bearing upon physical, mental, and cultural differences, in order to 
show the degree of heterogeneity in the group. -Without this informa- 
tion, the r may be of little value. 

Heterogeneity is introduced by other factors than age. If alco- 
holism, degeneracy, and bad heredity are all positively related, the r 
between alcoholism and degeneracy will be too high (because of the 
effect of heredity upon both factors) unless heredity can be “held 
constant.” Again, assume that we have measured two distinctly 
different groups, 500 college seniors and 500 day laborers, upon a 
cancellation test and upon a general intelligence test. The mean abil- 
ity in both tests will be definitely higher in the college group. Now 
even if the correlation between the two tests is zero within each group 
taken separately, if the two 


tion will appear because of the heterogeneity of the group with re- 


To be a valid measure of relationship, a correlation coefficient 
must be freed of the extraneous influences which affect the relation- 
s may be accomplished 
ch age, or whatever the 


В A e measured and its corre- 
lation with the variables studied can be calculated. 


2. Spurious index correlation } 


Even when three variables, X;, Xo, and Xs, are uncorrelated, a 
correlation between the indices 21 and 2; (where Z; = Xı/X; and 
Z» = Xo/X3) may appear which is as large as .50. To illustrate, if 
two individuals observe a series of magnitudes (e.g., Galton bar set- 
tings) independently, the absolute errors of observation (X, and X2) 

* Garrett, Н. Е. and Anastasi, А, “The Tetrad-Difference Criterion and the 
a Шешеді of Mental Traits,” Annals New York Academy of Sciences, 1932, 


Wi Yule, G. U., An Introduction to the Theory of Statistics, op. cit., pp. 215- 


Thomson, С. H., and Pintner, R., “Spurious Correlation and Relationship 
between Tests,” Journal of Educational Psychology, 1924, 15, 433-444. 
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still an appreciable correlation’ appear 
between the errors made by the two observers, when these are ex- 
pressed as percents of the observed magnitudes (Хз). The spurious 
clement here, of course, is the common factor X4 in the denominator 


of the ratios. 


| One of the commonest examples 
in psychology is found in the correlation of LQ.'s or E.Q.’s obtained 


from intelligence and achievement tests. тї the 1.Q.’s of 500 children 


ranging in age from three to fourteen years are calculated from two 
Aa М.А.» 


tests X, and Хә, the correlation is between = and FOU If С.А. 
were a constant (the same for all children) it would have no effect 
on the correlation and we would simply be correlating M.A.s. But 
when C.A. varies from child to child there is usually a correlation 
between C.A. and M.A. which tends to increase the 7 between I.Q.'s 
— sometimes considerably. 


may be uncorrelated, and 


of a spurious index relationship 


3. Spurious correlation between averages 
n the average scores made 
test are correlated against 
the average scores made by the same groups on a second test. Ап 
example is furnished by the correlations reported between mean intel- 
ligence test scores, by states, and such “educational” factors as num- 
ber of schools, books sold, magazines circulated in the states, ete. 
Most of these correlations are high—many above 90. If average 
th the correlations between 


correlations by states are compared wi 
intelligence scores and number of years spent in school within the 


separate states, these latter r's are usually much lower. Correla- 
tions between averages become “inflated” because а large number 
of factors which ordinarily reduce the correlation within a single 
group cancel out when averages are taken from group to group. 
Average intelligence test scores, for instance, increase regularly as 
we go up the occupational seale from day laborer to the professions; 
but the correlation between intelligence and status (training, salary, 
еіс.) at a given occupational level is far from perfect. 


Spurious correlation usually results whe 
by a number of different groups on à given 


PROBLEMS 


1. The correlation between à general intelligence test and school achieve- 
ment in a group of children from eight to fourteen years old is .80. The 


. 
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correlation between the general intelligence test and age in the same 
group is .70; and the correlation between school achievement and age is 
60. What is the correlation between general intelligence and school 
achievement in children of the same age? Comment upon your result. 


to 


In a group of 100 colleze freshmen, the correlation between (1) intelli- 
gence and (2) the A-cancellation test is 20. The correlation between 
(1) intelligence and (3) a battery of controlled association tests in the 
same group is .70. If the correlation between (2) cancellation and (3) 
controlled association is 45, what is the “net” correlation between intelli- 
gence and cancellation in this group? Between intelligence and con- 
trolled association? Interpret your results. 


3. Explain why some variables are of such a nature that it is difficult. to 
hold them "constant," and hence to employ them in problems involving 
partial correlation. 


4. Given the following data for fifty-six children: 


Хі- Stanford-Binet I.Q. 
Х» = Memory for Objects 
X; = Cube Initation 


М; = 101.71 М. = 10.06 Мз = 3.35 
9; = 13.65 9» = 3.06 оз = 2.02 
Тез 4] Ti = ‚50 Ta = .16 


(a) Work out the regression equation 
method of Section II. 

(b) Compute Е (өз) and O (est. x4): 

(c) If a child's Score is 12 in Test X. 
probable score in Х 1 (LQ)? 


(04 X» and Ху upon Xj, using the 


запа 4 in Test Хз, what is his most 


5. Let Y, be a criterion and X» and Xa 


be two other tests. Correlations 
and o's are as follows: 


Тіз = .60 бі-- 5.00 
T13 = .50 оз = 10.00 
T23 = .20 ба-- 8.00 


6. Given a team of two tests, each of which сог; 


relates .50 with a criterion. 
If the two tests correlate .20 


ddition of two such tests improve the pre- 
dictive value of the team? 


VE к 
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. Test A correlates .60 with a criterion and .50 with Test В, which cor- 


relates only .10 with the criterion. What is the multiple R of A and B 
with the criterion? Why is it higher than the correlation of A with the 


criterion? 


. Two absolutely independent tests B and C completely determine the 


criterion A. If B correlates .50 with A, what is the correlation of C and 
A? What is the multiple correlation of A with B and C? 


Comment upon the following statements: 

(a) It is good practice to correlate E.Q.'s achieved upon two educational 
achievement tests, no matter how wide the age range. 

(b) The positive correlation between average AGCT scores by states 
and the average elevation of the states above sea level proves 
the close relationship of intelligence and geography. 

(c) The correlation between memory test scores and tapping rate in à 
group of 200 eight-year-old children is .20; and the correlation be- 
tween memory test scores and tapping rate in a group of 100 college 
freshmen is .10. When the two groups are combined the correlation 
between these two tests becomes .40. This shows that we must have 


large groups in order to get high correlations. 


ANSWERS 


+ r= 67. 


- r (intelligence and cancellation) — 


—.18; r (intelligence and controlled 


association) — .70 
(а) X, -147Х,:1-298Х,-І- 76.95 
(b) Ri озу = 60; Gest. x1) = 10.93 
(c) 106.50 or 107 


» From X, alone, бев. X1) = 40 


From X; alone, G(est. 31) = 43 
From X, and Хз, Gest. X1) = 35 


(а) R increases from .64 to .73 
(b) В increases from .64 to .79 


E Ror an) = 65 
8. пао = 87; Raigo = 1.00 
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І.Тһе Wherry-Doolittle Test Selection Method * 


The method of solving multiple correlation problems outlined in 
Section II and Table 52 of Chapter 15 is adequate enough when 
there are only three (or not more than four) variables. In problems 
involving more than four variables, however, the mechanics of cal- 
culation become almost prohibitive unless some systematic scheme 
of solution is adopted (р. 387). The Wherry-Doolittle test selection 
method, to be presented in this section, provides a method of solving 
certain types of multiple correlation problems with a minimum of 
statistical labor. This method selects the tests of the battery ana- 
lytically and adds them one at a time until a maximum R is obtained. 
To illustrate, suppose we wish to predict aptitude for a certain tech- 
nical job ina factory. Criterion ratings for job proficiency have been 
obtained and eight tests tried out as possible indicators of job apti- 
tude. By use of the Wherry-Doolittle method we can (1) select those 
tests (e.g., three or four) which yield a maximum R with the criterion 
and discard the rest; (2) calculate the multiple R after the addition 
of each test, stopping the process when R no longer increases; (3) 
compute a multiple regression equation from which the criterion can 
be predicted with the highest precision of which the given list of tests 
is capable. 

The application of the Wherry-Doolittle test selection method to 
an actual problem is shown in Example (1) below. Steps in compu- 
tation are outlined in order and are illustrated by reference to the 
data of Example (1), so that the student may follow the process in 
detail. 

*Stead, W. H., Shartle, 
op. cit., Appendix 5. 
404 E 


С. L., et al, Occupational Counseling Techniques, 


utni Уз». Шана = а 


x 
\ 
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1. Solution of a multiple correlation problem by the Wherry-Doolittle Test 
Selection Method 


Example (1) In Table 54 are presented the intercorrelations 
of ten tests administered in the Minnesota study of Mechanical 
Ability. The criterion—called the “quality” criterion—was a meas- 
ure of the excellence of mechanical work done by 100 junior high- 
school boys. The tests in Table 62 are fairly representative of the 


wide 
diate 
valid 


terion most efficiently. Selection 0 


range of measures used in the Minnesota study. Our imme- 
problem is to choose from among these variables the most 
battery of tests, ie. those tests which will predict the cri- 
Í tests is made by the Wherry- 


Doolittle method. 


TABLE 54 Intercorrelations of ten tests and a criterion 


(Data from the Minnesota Study of Mechnical Ability *) 
List of Tests (М = 100) 
C = Quality criterion 
1- ЖОЛЫН blocks 
2 = Card sorting 
3 = Minnesota spatial relations boards, A, B, C, D 
4 = Paper form Bards A and B 
5 = Stenquist Picture I 
6 = Stenquist Picture П 
7 = Minnesota assembly boxes, A, B, C 
8 = Mechanical operations questionnaire 
9 = Interest analysis blank 
10 = Otis intelligence test 
1 2 3 4 5 6 7 8 9 10 
бе: : 53 .52 24 31 55 30 .55 .26 
1 2% 32 ЗА 14 18 .21 .30 (0 .34 .00 
2 23 14 .10 24 18 —12 .3 .08 
3 63 42 .39 .56 100 55 128, 
4 37 .30 .49 24 .61 .56 
5 54 46 947 23 11 
6 40 19 18 21 
7 40 41 13 
8 25 18 
9 38 


Steps in the solution of Example 


yf Step I 


Draw 


tion coefficients between te 


(1) may be outlined in order. 


ke those of Tables 55 and 56. The correla- 


up work sheets li 
sts and criterion are entered in Table 54. 


* Paterson, D. G., Elliott, R. М., et al., Minnesota Mechanical Ability Tests 


inneapolis: The University 0 


f Minnesota Press, 1930), Appendix 4. 
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a 
_ Step 2 | 
Enter these coefficients with signs reversed in the V, row of 
"Table 55.* The numbers heading the columns refer to the tests. 
TABLE 55 
Tests 
1 2 3 4 5 6 Е 8. 9 10 { 
Ёз —260 —.190 —.530 —.520 —.240 —.310 -.550 —300 —550 — 3200 Я 
Va —.095 —.118 —.222 -250 013 —090 —.080 —.324 —.188 
Ёз -.010 —.049 —.097 —.091 029 —.103 -.047 —.061 
Y. .005 —034 —.057 1054 —1072 —.053 —.056 
Vs —.012 —.039 | .062 -.065 -051 —.018 | 
Ёё _(—.550)#, Y^ (—324*.Yé (—097 vs (—057) Vat _ (—005) 
Z: — 1000 72,77 832 ° Zs ^ —503 "Zi 489 77. — TTB 
= 3025 = 1261 = .0167 = .0066 = .0054 
/ Step 3 
Enter the numbers 1.000 in each column of the row Z, in Table 56. 


TABLE 56 


і 788 0 (840 832 983 
з 858 .945 „563 559 7786 839 `831 .854 
24 ..839 71931 А489 .748 782 829 | +852 
Zi .796 .927 | 797 — 715 .829 :637 
| 
1 
3a = 1202 
as yv 
ба 1776 | 
1 
255 = 2045 
2 Step 4 


Select that test having the highest = quotient as the first test of 


1 
the battery. From Tables 55 and 56 we find that Tests 7 and 9 both | 
have correlations of .550 with the criterion, and that these are the 
* Correlation coefficients are assumed to 


mals in subsequent calculations to avoid 
when decimals are rounded to two places. 


be accurate to three or to four deci- 


the loss of precision which results 
(See p. 20.) 
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largest r’s in the table. Either Test 7 or Test 9 could be selected as 
the first test of our battery. We have chosen Test 7 because it is the 
more objective measure of performance. 


tep 5 
Apply the Wherry shrinkage formula 

mic: N-1 

е -1 к(1-1) (109) 
» multiple correlation coefficient, the 
s been removed.* This cor- 
tic way as follows: 


in which Б is the “shrunken 
coefficient from which chance error һа 
rected R may be calculated in a systema 


,(i) Prepare а work sheet similar to that shown in Table 57. 


TABLE 57 
B b с а е ғ 
ys N-—1 та т Р Test 

m "Us K? NER K " 
0 1.000 (N — 100) à "T 
1 3025 6975 1.000 6975 30 к 
2 1261 5714 1.010 5771 4229 5509 9 
3 0167 5547 1021 5663 4337 6586 3 
4 0066 5481 1.031 5651 4349 16595 4 
5 0054 5427 1042 5655 4345 6591 6 


\ (2) Enter 1.000 in column с, row 0, under K*. Enter N = 100 in 


column d. 
(3) Enter the quo 


ys. (=.550)? 


i Vom b, row 1 
tient y in column b, row 1. ГА 1000 


= 3025 T 
‚ (4) Subtract .3025 from 1.000 to 
umn c under K?. 


give .6975 as the entry in col- 


\ (5) Find the quotient aw E and record it in column d. 


(N — 1) = 99; and since т 


) 

(number of tests selected) is 1, 
N-1 

(М- т) also equals 99 and ^ = = = 1.000. 


for Predicting the Shrinkage of the Coeffi- 
ls of M: athematical Statistics, 1931, Vol. 2. 


,* Wherry, R. J., “А New Formula 
Toi of Multiple Correlation,” Аппа 


51. 
+ Quotient is taken to four decimals (p. 406). 
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(6) Write the product of columns с and d in column e: .6975 X 
í 1.000 = .6975. к 
(7) Subtract the column e entry from 1.000 to obtain R, (the 
shrunken multiple correlation coefficient) in column f. In 
Table 57 the R? entry, of course, is .3025. 


.. (8) Find the square root > of the column f entry and enter the result 
‹ in column g under Б. Our entry is .5500, the correlation of 
Test 7 with the criterion. No correction for chance errors is 
necessary for one test. 
Step 6 
“лер, 


/ 


То aid in the selection of a second test to be added to our battery 
of one, a work sheet similar to that shown in Table 58 should be pre- 
pared. Calculations in Table 58 are as follows 4 

.(1) Leave a; row blank. 

% (2) Enter in row b, the correlations of Test 7 (first selected test) 
with each of the other tests in Table 54. These r’s are .300, 
:180, .560, etc., and are entered in the columns numbered to 
correspond to the tests. Enter 1.000 in the column for Test 7. 
In column —C enter the correlation of Test 7 with the criterion 
with sign reversed, i.e., as -.550. : 

Write the algebraic sum of the b; entries in the *Check Sum" 
column. This sum is 3.730. 

Multiply each b, entry by the negative reciprocal of the bı 
entry for Test 7, the first selected test. Enter these products 
in the ci row. Since the negative reciprocal of Test 7’s b1 


entry is —1.000, we need simply write the b, entries in the сі 
row with signs reversed. 


(3 


= 


/ 


Te 


— 


22 Step 7 


Draw a vertical line under Test 7 in Table 55 to show that it has 
been selected. To select a second test proceed as follows: 


(1) To each У; entry in Table 55, add algebraically the product 
of the b, entry in the criterion (—C) column of Table 58 by 
the c; entry for each of the other tests, Enter results in the 
V2 row. The formula for V, is Уз = Vi-E b; (criterion) X c1 
(each test). То illustrate, from Table 58 and Table 55 we have 


For Test 1: V, = —.260 + (—.550) x (~.300) = 
—.260 + 165 = — 095 
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89 318ЯУ1 


410 + STATISTICS IN PSYCHOLOGY AND EDUCATION 
For Test 4: V, = —.520 + (—.550) X (—.490) = 
—.520 + .270 = —.250 
For Test 9: Va = —.550 + (—.550) X (—.410) = 
—.550 + .226 = —.324 


‚ (2) To each 24 in Table 56 add algebraically the product of the bi 
A and c, entries for each test got from Table 58. Enter these 
results in the Zə row. The formula is Zə = Z1-- b, (a given 
test) X c, (same test). To illustrate, from Tables 55 and 58. 
For Test 1: Z= 1.000 + (.300) X (—.300) — 1.000 — .090 

= .910 
For Test 4: 2: = 1.000 + (.490)  (—.490) = 1.000 — .240 

= .760 
For Test 9: 7» = 1.000 + (.410) X (—.410) = 1.000 — .168 

= 832 

Step 8 


2 
Now select the test having the largest Уг quotient, as the second 


2 2 
test for our battery. The quantity Ys is а measure of the amount 


which the second test contribute. 


s to the Squared multiple correlation 
coefficient, R2, 


From Tables 55 and 56 we find that Test 9 has the 
2 2 
largest Үз quotient; (324) _ 12 
Zo 832 


Step 9 


To calculate the new multiple correlation coefficient when Test 9 
is added to Test 7, proceed as follows: 


(1) Тһе quantity .1261 > is entered іп column b, row 2 of 
Table 57. С 


2 
(2) Subtract the ratio Уа from the K? entry in column с, row 1, 


2 . 
and enter the result in column с, row 2; e.g., for the entry in 
column с, row 2, we have 6975 — -1261, or .5714. 


‚ (8) Find the quotient еа Since М = 100 and m (number 
( 


-т) - 
of tests chosen) = 2, we have ФУ сл). ог ы 1.010, as the 


(N—m) 98 
column d, row 2 entry. 
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(4) Record the product of the c and d columns in column e: 
5714 X 1.010 = .5771. 

(5) Subtract .5771 (column e) from 1.000 to give .4229 as the 
entry in column f, row 2. 

(6) Take the square root of .4229 and enter the result, .6503, in 
column g. This is the multiple coefficent R corrected for 
chance errors. It is clear that by adding Test 9 to Test 7 we 
increase R from .5500 to .6503, a substantial gain. 


- Step 10 


Since Ӯ for Tests 7 and 9 is larger than the correlation for Test 7 
alone, we proceed to add a third test in the hope of further increasing 
the multiple Б. The procedure is shown in Step 11. 


Step 11 


| Return to Table 58 апа 
(1) Record in the аз row the correlation coefficient of the second 
selected test (ie. Test 9) with each of the other tests and 
A; with the criterion. (Read 778 from Table 54.) The correlation 
of Test 9 with the criterion is entered with sign reversed (i.e., 

as —.550). 
(2) Enter the algebraic su 


Check Sum column. 
(3) Draw a vertical line down through the b» and c» rows for 
Test 7, the first selected test. This indicates that Test 7 has 


already been chosen. 


m of the аз entries (i.e. 3.580) in the 


for each test by adding to the a» entry 


} | (4) Compute the b» entry 
the produet of the b: entry of the given test by the сі entry 
of the second selected test (ie., Test 9). The formula is 
b, = azb: (given test) X e (second selected test). To il- 
lustrate: 
For Test 2: be = -230-+ (.130) (—.410) = .230 — .053 = 
ai 
For Test 6: Әз- .180 -+ (400) (—.410) = .130 — .164 = 
—.034 
\ For Test 10: bs = -380 + (130) (—.410) = .380 — .053 = 
1 .927 


Compute b; entries for criterion and .Check Sum column in 
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the same way. For the criterion column we have —.550 
++ (—.550) (—.410) or —.324. For the Check Sum column we 
have 3.580 + (3.730) (—.410) or 2.051. 

There are three checks for the b» row. (а) The entry for the 
second selected test (Test 9) should equal the Z; entry for the 
same test in Table 56. Note that both entries are .832. (b) 
The entry in the criterion column should equal the V5 entry of 
the second selected test (Test 9) in Table 55; both entries are 
—.824. (с) The entry in the Check Sum column should equal 
the sum of all of the entries in the bs rows. Adding .217, .177, 
-320, etc., we get 2.051, checking our calculations to the third 
decimal. . 

(6) Multiply each b; entry by the negative reciprocal of the b» 
entry for the second selected test, (Test 9), and record results 
in the c» row. "Тһе negative reciprocal of .832 is —1.202. The 
c» entry for Test 1 is 217 X —1.202 or —.261; for Test 2, 
--177 X —1.202 or —.213; and so on for the other tests. For 
the criterion column the c; entry is (—.324) X —1.202 or .389; 
and for the Check Sum the Сә entry is 2.051 X —1.202 or 
—2.465. ` 

There are three checks for the co entries. (a) Тһе c» row entry 
of the second selected test (Test 9) should be —1.000. (b) The 
€» entry in the Check Sum column should equal the sum of all 
со entries. Adding the c» entries in Table 58, we find the sum 
to be —2.465, the Check Sum entry. (c) The product of the 
b» and с» entries in the criterion column should equal the 


m a ee 
quotient yim column b, row 2, of Table 57 in absolute value. 


2 
Note that the product (—.324 X .389) = — 1261, thus check- 
Ing our entry (disregard signs). 


ж- 
сл 
LS 


«(i 


— 


Step 12 


Draw a vertical line under Test 9 in Table 55, to indicate that it 
has been selected as our second test. Then proceed as in Step 7 to 
compute V3 and Zs in order to select a third test. The formula for 
Уз is Уз = Va-F be (criterion) X c; (each test). The formula for 
Zs is Zs = Z + 02 (a given test) X с» (same test). The third selected 


test is that one which has the largest ve quotient in Table 55. This 
3 


* 


mz 
— ны шый | Oe 
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is Test 3, for which Уз = — 922 + (—.324) (—.385) or —.097; and 
7.2. 
Za = 1686 + (320) (—.385) = 563. The quotient A = 0167. 
3 


Step 13 

Entering .0167 n in column b, row 3, of Table 57, follow the 
procedure of Step 9 to get Ё = .6586. Note that (үлі = 99/97 
or 1.021; and that the new Ё is larger than the .6503 found for the 
two tests, 7 and 9. We include Test 3 in our battery, therefore, and 
proceed to calculate аҙ, bs, and сз (Table 58), following Step 11, in 
order to select a fourth test. 


Step 14 


The ag entries in Table 58 ) 
of the other tests including the criterion. The criter 
entered in the —C column with a negative sign (i.e., as —.530). 
given test) X ci (third se- 
(third. selected test). To 


are the correlations of Test 3 with each 
ion correlation is 


(1) The formula for bs is bs = ds +b: ( 
lected test) +b» (given test) Хо 


illustrate, 

For Test 1: bs = -340 + (.300) (—.560) + (217) (—.385) 
= .088 

For Test.4: bs .630 + (490) (—.560) + (.409) (—.385) 
= .199 


Check the bs entries by Step 11 (5). (а) Note that the bs 
entry for the third selected test (Test 3) equals the Za entry 
for Test 3 in Table 56, namely, 563. (b) The entry in the 
criterion column equals the Уз entry of the third selected test 
(Test 3) in Table 55, i.e --.097. (с) The Check Sum entry 
(1.161) equals the sum of the entries in the bs row. 

(2) The formula for сз is ра X the negative reciprocal of the bs 
entry for the third selected test (Test 3). The negative re- 
ciprocal of .563 is —1.776. To illustrate the calculation for ' 
Test 5, c = -146 X —1.776 = —.259. Check the сз entries by 
Step 11 (7). (а) The єз row entry of the third selected 
test (Test 3) equals —1.000. (b) The cs entry in the 
Check Sum column, namely, —2.062, equals the sum of the сз 
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ға” 


‚тозу. (c) The product of the b; and сз entries in the criterion 


s үз? H 
column (namely, —.097 X .172) equals the quotient (29) 
(1.е., .0167) іп absolute value. 


Step ІБ 


Repeat Step 12 to find V, and Z,. The formula for V4 is 
Уі-Уҙ--ӛз (criterion) X сз (each test). Also, the formula 
for Z, is 234-03 (a given test) X сз (same test). For Test 4, 
Vs = —.091 + (—.097) (—.353) or —.057; and 7, = .559 -+ (.199) 


7 2 — (572 Р 
(—.353) ог .489. Тһе quotient, КЕ. equals (=.057)? or .0066. While 
24 489 


7. 
none of the V, entries is large, Test 4 has the largest — 


y - quotient, and 
Z 


Zn 


hence is selected as our fourth test. Enter .0066 ve in column b, 
row 4, of Table 57. Follow the procedure of Step 9 t get R = .6595. 
М-і1). E 
Note that ao is 99/96 or 1.031; and that the new R is but 
m 


slightly larger than the Ё of -6586 found for the three tests, 7, 9, 
and 3. When R decreases or fails to increase, there is no point in 
adding new tests to the battery. The increase in R is so small as a 
result of adding Test 4 that it is hardly profitable to enlarge our bat- 
tery by a fifth test. We shall add a fifth test, however, in order to 
illustrate a further step in the selection process. 


Step 16 


To choose a fifth test, calculate аз, bs, and c4, following Step 11, Ж 
and enter the results in Table 58. The a, entries are the correlations ж.) 


of the fourth selected test (Test 4) with each of the other tests in- 
cluding the criterion (with sign reversed). 


(1) The formula for b, may readily be written by analogy to the 
formulas for bs and b; as follows: b, = a4 + bı (given test) 
X cı (fourth selected test) + be (given test) X cə (fourth se- 


lected test) + bs (given test) x c (fourth selected test). To 
illustrate 


For Test 6: Б, = 300+ (.400) (— 490) 4- (—.034) (—.492) - 
+ (179) (—.353) = .058 | 
For Test 10: b, = .560 + (130) (—.490) -+ (.327) (—.492) | 
+ (031) (—.353) = .394 | | 
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Check the b, entries by Step 11 (5). (а) The b, entry for the 
fourth selected test (Test 4) equals the Zs entry for Test 4 in 
Table 56, namely, .489. (b) The entry in the criterion column 
equals the V4 entry of the fourth selected test (Test 4), i.e., 
—.057. (c) The Check Sum (.715) equals the sum of the 
entries in the b, row. 

(2) To find the entries c4, multiply each bs by the negative re- 
ciprocal of the b4 entry for the fourth selected test (Test 4). 
The negative reciprocal of 489 is —2.045. To illustrate, 


For Test 1: c4 = —.145 X —2.045 = .297. 


Check the с; entries by Step 11 (7). (а) The c4 row entry of 
the fourth selected test (Test 4) equals —1.000. (b) The c4 
entry in the Check Sum column, namely, —1.462, equals the 
sum of the c, row. (c) The product of the b, and c4 entries in 
the criterion column (namely, — 057 X 117) equals the quo- 


tient D (i.e., .0066) in absolute value. 
ММ: 


Step 17 


аайы Ve Кал ba (criterion) X са 


Repeat Step 12 to find Vs 
(a given test) xc, (same test). Test 


(each test) ; and Zs = Z4 + bs 
6 has the largest (58) quotient (Le, (0054) and this number is 


entered in column. b, row 5, of Table 57. Following Step 9, we get 
E = 16591. This multiple correlation coefficient is smaller than the 
preceding R. We need go no further, therefore, as we have reached 
the point:of-diminishing returns and the addition of a sixth test will 


not increase the multiple Ё. It may be noted that four (really three) 


tests constitute a battery which has the highest validity of any com- 
bination of tests chosen from our list of ten. The multiple & between 
the criterion and all ten tests would be somewhat lower—when cor- 
rected for chance error—than the R we have found for our battery of 
four tests. The W herry-Doolittle method not only selects the most 
economical battery but saves 4 large amount of statistical work. 


5 


2. Calculation of the multiple regression equation for tests selected by 


the Wherry-Doolittle Method 


Steps involved in setting up а multiple regression equation for the 
tests selected in Table 58 may be set down as follows: 
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TABLE 59 
7 9 3 4 =G 
— 1.000 — .410 - .560 — 490 .550 
& cn — .385 - .492 .389 
Сз - 1.000 — .353 .172 
C, — 1000 .117 
Step | 


Draw up a work sheet like that shown in Table 59. Enter the C 
entries for the four selected tests (namely, 7, 9, 3, and 4) and for the 
criterion, following the order in which the tests were selected for the 


battery. When equated to zero, each row in Table 59 is an equation 
defining the beta weights. 
For our four tests, the equations are 


— 1.0008; — .4106, — .56085 — 4908, + .550 = 0 
— 1.0008, — .3858: — .4928, + .389 = 0 
- 1.0006; — .3538, + .172 = 0 
- 1.0008, + .117 = 0 
Step 2 


Solve the fourth equation to find f, = .117. 
Step 3 

Substitute for [у= .117 in the third equation to get Вз = .131. 
Step 4 


Substitute for Bs and В, in the second equation to get Во = .280. 


Finally, substitute for а, б, and Во in the first equation to get 
Br = .305. 


Step 5 


The regression equation for predicting the criterion from the four 
selected tests (7, 9, 3, and 4) may be written in o-score form by 
means of formula (104), page 393, as follows: 

Ze = Brz + Bozo + Baza + Biza 


in which 6; = bersa; Bo = Вали; Bs = Вали; Bs = Basn 
Substituting for the 876 we have j 


Ze = .3052; + .280z; + .13125 + .11724. 


n 
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То predict the criterion score of any subject in our group, substitute 
his scores in Tests 7, 9, 3, and 4 (expressed as o-scores) in this 


equation. 


Step 6 
To write the regression equation in score form the f's must be 
transformed into b's by means of formula (103), page 393, as follows: 
9c Oc o. = Cora. - бе 
ра) be = GF, Bas by = 80 b= ені; 
t scores: ог of Test 7, co of Test 9, de of 


The o’s are the SD’s of the tes 
Oc 


the criterion, etc. In general, b,— = Bg 
p 
Step 7 
The regression equation in score form may now be written 
X. =, + bXo + Хз + Xa + K * (99) page 391 


and the Gest.x, = 0 1 — Recon) (33) page 162 


3. Checking the В weights and multiple R 


Step 1 
The В weights may be checked by formula (108), page 396 in 
which R is expressed in terms of beta coefficients. In the present 


example, we have 
2, (тоза) = Brter + Boreo + Baros + Bares 
nd the 7’s are the correlations between 


in which с equals the criterion а relat 
7, 9, 3, and 4. Substituting for the r's 


the criterion (c) and the Tests, 
and s (computed in the last section) we have 
305 X -550 + 280 X .550 + -131 X .530 + .117 X .520 
11678 +- -1540 + .0694 + .0608 = 4520 

(тоза) = .6728 

From R?,(7934) we know that our battery accounts for 45% of the 
variance of the criterion. Also (р. 396) our four tests (7, 9, 3, and 4) 
contribute 17%, 15%, 7%, and 6%, respectively, to the variance of 
the criterion. 


2 
R с(7934) 


* This equation is not written for our four tests because means and SD’s are 
not given in Table 54. 
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Step 2 


Тһе E? of .4520 caleulated above should equal (1 — K?) when Ж? 
is taken from column c, row 4, in Table 57. From Table 57 we find 
that 1 — K? = 1 — .5481 or 4519 which checks the R? found above 
—and hence the В weights—very closely. 


Step 3 


It will be noted that the multiple correlation coefficient, of .6723 
found above is somewhat larger than the shrunken R of .6595 found 
between the criterion and our battery of four tests in Table 57. The 
multiple correlation coefficient obtained from a sample always tends 
—through the operation of chance errors—to be larger than the cor- 
relation in the population from which the sample was drawn, espe- 
cially when N is small or the number of test variables large. For 
this reason, the calculated R must be “adjusted” in order to give us а 
better estimate of the correlation in the population.* The relation- 
ship of the R, corrected for chance errors, to the R as usually caleu- 
lated, is given by the following equation: 


R= (N — 1) R? — (m — 1) 
pru el) 
(N — m) 
(relation of В. to R corrected for chance errors) 


Substituting 4520 for R2, 99 for (М — 1), 96 for (N — m) and 3 for 


(110) 


^ (т — 1), we have from: (110). that 


Fe — 99 X .4520 — 3 
= 29 X -4520 — 3 


= 4349 
96 


and 


R = .6595 (see Table 57) 
The Е of .6595 is the correcte 
terion and test battery, 
mated for the populatio: 
the present problem, 
(.6723 — .6595 = :0128) 
only four tests іп the mu 


d multiple correlation between our cri- 
or the multiple correlation coefficient esti- 
n from which our sample was drawn. In 
shrinkage in multiple R is quite small 
as the sample is fairly large and there are 
ltiple regression equation. 

* Ezekiel, M., Methods of Correlation Analysis, op. cit., 323-324. 
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ІІ. Limitations to the Use of Partial 
and Multiple Correlation 


Certain cautions in the use of partial and multiple correlation may 
be indicated in concluding this chapter. 

(1) In order that partial coefficients of correlation be valid meas- 
ures of relationship, it is necessary that all zero order coefficients be 
computed from data in which the regression is linear. 

(2) The number of cases in 8 multiple correlation problem should 
be large, especially if there are а number of variables; otherwise the 
coefficients caleulated from the data will have little significance. 
Coefficients which are misleadingly high or low may be obtained 
When studies which involve many variables are based on relatively 
few cases. The question of accuracy of computation is also involved. 
A general rule advocated by many workers is that results should be 
carried to as many decimals as there are variables in the problem. 
How strictly this rule is to be followed must depend upon the accu- 
racy of the original measures. “ч 

(8) A serious limitation to а clear-cut interpretation of a partial 7 
arises from the fact that most of the tests employed by psychologists 
probably depend upon a large number of “determiners.” When we 
“Partial out” the influence of clear-cut and relatively objective fac- 
tors such as age, height, school grade, etc., we have a reasonably clear 
notion of what the “partials” mean. But when we attempt to render 
Variability due to “logical memory” constant by partialling out 
Memory test.scores from the correlation between general intelligence 
test scores and educational achievement, the result is by no means so 
unequivocal. The abilities determining the scores in general intelli- 
gence and in school achievement undoubtedly overlap the memory 
test in other respects than in the “memory” involved. Partialling out 
а memory test score from the correlation between general intelligence 
and educational achievement, therefore, will render constant the in- 
fluence of many factors not strictly “memory,” i.e., partial out too 
much. 

To illustrate this point again it would be fallacious to interpret 
the partial correlation between reading comprehension and arithme- 
tie; say; with the-influence of “general intelligence" partialled out, 
as giving the net. relationship between these two-variables for a con- 
Stant degree of intelligence»-Both reading and arithmetic enter with 
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heavy, but unknown, weight into most general intelligence tests; 
hence the partial correlation between these two, for general intelli- 
gence constant, cannot be interpreted in a clear-cut and meaning- 
ful way. 

Partial r's obtained from psychological and educational tests, 
though often difficult to interpret, may be used in multiple regression 
equations when the purpose is to determine the relative weight to 
be assigned the various tests of a battery. But we should be cautious 
in attempting to give psychological meaning to such residual, i.e., 
partial, r’s. Several writers have discussed this problem, and should 
be referred to by the investigator who plans to use partial and multi- 
ple correlation extensively.* 3 

(4) Perhaps the chief limitation to Е, the coefficient of multiple 
correlation, is the fact that, since it is always positive, variable 
errors of sampling tend to accumulate and thus make the coefficient 


too large. A correction to be applied to Е, when the sample is small 
and the number of variables la 


PROBLEMS 


1. The following data 


+ were assembled for sixteen large cities (of around 
500,000 inhabitants 


) in a study of factors making for variation in crime. 


X, (criterion) = crime rate: number known offenses per 1000 inhabi- 
tants 
1 = percentage of male inhabitants 
X» = percentage of male native whites of native parentage 
Хұ- Percentage of foreign-born males 
X, = number children under five per 1000 married women 
fifteen to forty-four years old 
Х = number Negroes per 100 of population 
Xe = number male children of foreign-born parents per 100 
of population 
X; = number males and females ten years and over, in man- 
ufacturing, per 100 of population 


; 
from the Catholic University of America, 1932, 3, 1-39 
T Ogburn, W. F., “Factors in the Vari: 


of the American Statistical Association, 1935, 80,1 


Ж 


. (a 


З. In Problem 4, page 402: 


- (а) The R's are, for Test 6, .540; 
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M, = 19.9 M, = 492 Ma = 228 Мз = 102 M, = 4814 М;-47 
g,— 79 o= 13 o= 72 03= 46 0%- 744 05 = 40 
M,= 131 M; = 217 
со 42 o,— 43 


Intercorrelations be 
1 2 3 4 5 6 7 

с ж Айз & il ness 58] -50 -20 
1 01 ОБ: 22-491 | 2-15 01 22 
2 —92 —54 55  —93 — 80 
3 Йй Sts 82 40 
4 —.06 59 74 
5 —67 = 
6 21 


(а) Ву means of the Wherry-Doolittle method select those variables 
lation with the criterion. 


which give a maximum corre 

(b) Work out the regression equation in score form (p. 393) and 
O (est. ХО" 

Determine the independent contribution 
factors to crime rate (to 8%). 
Compare В and R. Why is the adj 
What is the probable crime rate (from Problem 1) for a city in 
which Xg = 150, Ху = 50%, X, 60 and X; = 20.0? 

(b) Fora city in which Хо = 13, X1 = 48%, Xs = 50 and Хт = 22.0? 
(c) By how much does the use of multiple R reduce безі. xp? 


(c of each of the selected 


(d 


— 


ustment fairly large? (see p. 418) 


UNE, 


(а) Work out the regression equation using the Wherry-Doolittle 


method. І 
(b) How much shrinkage is there when №. (эз) is cor 
errors (p. 407)? 


rected for chance 


ANSWERS 


for Tests 6 and 1, .674; for Tests 6, 

1, and 5, 713; for Tests 6, 1, 5, and 7, 722. В drops to 702, when 
Test 4 is added. 

(b) Y,— — 42X6 4335X + 82X5 — 40X«— 134.59. 


Gest. хо) = 5.47 
(c) Ri, s = 121 4 242 + 210 + 053. Tests 6, 1, 5, and 7 con- 


tribute 12%, 24%, 21%, and 4%, respectively. 
(d) R — 785; Е = 722; shrinkage is 063. 
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Z. (a) 23.53 

(b) 16.05 

(c) From 7.9 to 5.5 or 30% 
8. (b) Rs is .59; Fog) = .60 


"c 


TOmmoomP» 
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APPENDIX OF TABLES 


. Areas, Normal Probability Curve. 

. Ordinates of the Normal Probability Curve. 
. Pearson r into Equivalent 2. 

. Table of t. 


Chi-square Table 
F-Table. 


. Calculation of T-Scores 
- Mean o-distances from 


mal distribution. 
To Infer VI — 7” from т. 


. Significance of Coefficients of Correlation 


the mean, of various percents of a nor- 
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TABLE A Fractional parts of the total area (taken as 10,000) under the 
normal probability curve, corresponding to distances on the 
baseline between the mean and successive points laid off from 
the mean in units of standard deviation 


Example: between the mean and а point 1,880 
41.62% of the entire area under the curve. 


= .00 01.02 
0.0 0000 0040 0080 
0.1 0398 0438 0478 
0.2 0793 0832 0871 
0.3 1179 1217 1255 
0.4 1554 1591 1698 
0.5 1915 1950 1985 
0.6 2257 2991 2324 
0.7 2580 2611 2642 
0.8 2881 2910 2939 
0.9 3159 3186 3212 
1.0 3413 3438 3461 
11 3643 3665 3686 
1.2 3849 3860 3888 
13 4032 4049 4066 
14 4192 4207 4999 
15 4332 4345 4357 
16 4452 4463 4474 
1.7 4554 4564 4573 
18 4641 4640 4656 
10 4713 4719 4726 
20 4772 4778 4783 
21 4821 4826 4830 
2.2 4861 4864 1565 
2.3 4803 4896 4898 
24 4918 4920 4922 
2.5 4938 4940 4941 
2.0 4953 4055 4956 
2.7 4965 4966 4967 
2.8 4074 4975 4976 
2.9 4981 4082 1089 
3.0 4986.5 4986.9 49; 
8.1 4990.3 4990.6 49 
3.2 4993.199 

3.3 4995.166 

8.4 4996.631 

3.5 4997.674 

3.6 4998.409 

8.7 4998.922 

3.8 4999.277 

8.0 4999519 

4.0 4999.683 

4.5 4999.966 

5.0 4999.997133 


rx cec ecu a саналы 


03 


0120 
0517 
0910 
1293 
1664 
2019 
2357 
2673 
2967 
3238 


04 


4984 


4988.6 4988.9 4989.3 4989.7 4990.0 
4991,8 4992.1 4992.4 4992.6 4992 9 


4985 


(2 = 1.38) are found 
c 


.09 


0359 
0753 
1141 
1517 
1879 


2224 
2549 
2852 
3133 
3389 


8621 
3830 
4015 
4177 
4319 


4441 
4545 
4688 
4706 
4767 


4817 
4557 
4890 
4916 
4936 


4952 
4964 
4974 
4981 
4986 


E 


n 
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TABLE В Ordinates of the normal probability curve expressed as frac- 
tional parts of the mean ordinate, Yo 


The height of the ordinate erected at the mean can be computed from 


328 EN: — is TH : 5 
Yo = im where үл = 2.51 and Sx .3989. The height of any other ordi- 


nate, in terms of yo, can be read from the table when one knows the distance 
which the ordinate is from the mean. For example: the height of an ordinate 
a distance of —2.37с from the mean is 106029 yo. Decimals have been omitted. 
in the body of the table. 


0 


ан 


100000 | 99995 | 99980 | 99955 | 99920 | 99875 99820 
99501 | 99396 |99283 | 99158 | 99025 98881 | 98728 | 98565 | 98393 | 98211 
98090 | 97819 | 97609 | 97390 | 97161 96993 | 96676 | 96420 | 96156 | 95882 
95600 | 95309 | 95010 | 94702 | 94387 | 94055 93723 | 93382 | 93024 | 92677 
92312 | 91399 | 91558 | 91169 | 90774 90371 | 89961 | 89543 | 89119 | 88688 


88250 | 87805 | 87353 | 86896 | 86432 | 85062 | 85488 | 85006 | 84519 | 84060 
89250 | 33023 | 82514 | 82010 | 81481 | 80957 | 80429 | 79896 | 79359 | 78817 
58527 | 27721 | 77167 | 76610 | 76048 | 75484 | 74916 | 74342 | 73769 | 75193 
T5219 72033 | 71448 | 70861 | 70272 | 69681 | 69087 | 68493 | 67596 67298 
72615 | 66097 | 65494 | 64891 | 64287 | 63683 | 63077 | 62472 61865 | 61259 


60653 | 60047 | 59440 58834 | 58228 | 57623 | 57017 | 56414 55810 | 55209 
54607 |54007 | 53409 52812 | 52214 | 51620 | 51027 50437 | 49848 | 49260 
48675 | 48092 | 47511 46933 | 46357 | 45783 | 45212 44644 | 44078 | 43516 
42956 | 42399 | 41845 41294 | 40747 | 40202 | 39661 39123 | 38569 | 38058 

35459 | 34950 | 34445 | 33944 33447 | 32954 


37531 | 37007 | 36487 35971 
1500 | 31023 | 30550 30082 | 29618 | 29158 | 28702 28251 
37804 31557 20923 26489 | 26059 | 25634 25213 | 24797 | 24385 | 23978 
23575 | 23176 | 22782 22392 | 22008 | 21627 | 21251 20879 | 20511 | 20148 
19790 | 19436 | 19086 18741 | 18400 | 18064 | 17732 17404 | 17081 | 16762 
16448 | 16137 | 15831 15530 15232 | 14939 | 14650 | 14364 14083 | 13806 
13000 | 12740 | 12483 12230 | 11981 | 11737 | 11496 11259 
11095 10205 10570 10347 | 10129 | 09914 09702 | 09495 | 09290 09090 
08892 | 08698 | 08507 08320 | 08136 | 07956 07778 | 07604 | 07433 02202 
07100 | 06939 | 06780 06624 | 06471 | 06321 06174 | 06099 | 05888 | 05750 
05222 | 05096 | 04973 04852 | 04734 | 04618 | 04505 


05614 05481 05350 
04179 04074 | 03972 {03873 03775 | 03680 03586 | 03494 
gezos [027a шз ee Esp ІНЕ 
02612 42 | 0: 2081 | 01536 
1876 | 01823 01772 01723 01674 01627 | 0 
01252 03928 01819 01367 01328 01288 01252 01215 01179 | 01145 


ооооо еоооо 
сочоол кою-о 


Hemmm 
еюн o 


HHHH 


юно ouo 


SOO üob-o жо 


19 | 00598 | 00432 | 00309 00219 | 00153 | 00106 00073 | 00050 
00034 00829 ray | 00010 | 00006 | 00004 00003 | 00002 | 00001 | 00001 
00000 


mpo юююмюю ююююю 
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TABLE C Conversion of a Pearson r into 


а corresponding Fisher's z 
coefficient 
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TABLE D Table of f, for use in determining the reliability of statistics 


Example: When the df are 35 and # = 2.03, the 05 in column 3 means that 5 
times in 100 trials a divergence as large as that obtained may be expected in the 
positive and negative directions. 


Degrees of Probability (P) 
Freedom 0.10 0.05 0.02 0.01 
1 t = 6.34 t = 12.71 t = 31.82 t = 63.66 
2 2.92 4.30 6.96 9.92 
3 2.35 3.18 4.54 5.84 
4 213 2.78 3.75 4.60 
5 2.02 2.57 3.36 4.03 
6 1,94 245 3.14 3.71 
7 1.90 2.36 3.00 3.50 
8 186 2.31 2.90 3.36 
9 183 226 282 3.25 
10 181 2.23 2.76 3.17 
1 180 220 2.72 3.11 
12 178 218 268 3.06 
13 177 2.16 2.65 3.01 
14 1.76 2.14 2.62 2.98 
15 1.75 2.13 2.60 2.95 
16 1.75 2.12 2.58 2.92 
17 1.74 2.11 2.57 2.90 
18 1.73 210 2.55 288 
19 1.73 2.09 2.54 2.86 
20 1.72 2.09 2.53 2.84 
21 1.72 2.08 2.52 2.83 
22 172 2.07 2.51 282 
93 171 2.07 2.50 281 
24 171 2.06 249 2.80 
25 1471 2.06 248 2.79 
26 1.71 2.06 2.48 2.78 
27 1.70 2.05 247 2.77 
28 1.70 2.05 247 2.76 
29 1.70 2.04 2.46 2.76 
30 1.70 2.04 2.46 2.75 
Б 1.69 2.03 2.44 2.72 
a Pn 1.68 2.02 242 2.71 
45 1.68 2.02 241 2.69 
50 1.68 2.01 240 2.68 
60 1.67 2.00 2.39 2.66 
70 167 2.00 238 2.65 
80 1.66 1.99 2.38 2.64 
90 1.66 1.99 2.37 2.63 
100 ` 166 1.98 2.36 2.63 
125 1.66 1.98 2.36 2.62 
150 1.66 1.98 2.35 2.61 
200 1.65 197 2.35 2.60 
800 1.65 197 2.34 2.59 
400 1.65 1.97 234 2.59 
500 165 1.96 233 2.59 
1000 1.65 1.96 2.33 2.58 
оо 1.65 196. 233 258 - 


Ф 
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TABLE\G To facilitate the calculation of T-scores / 


The percents refer to the percentage of the total frequency below a 
given score + 1/2 of the frequency on that score. T-scores are 
read directly from the given percentages. 


Percent T-score Percent T-score 
0032 10 53.98 51 
| ; .0048 11 57.93 52 
.007 12 61.79 53 
.011 13 65.54 54 
.016 14 69.15 55 
.023 15 72.57 56 
.034 16 75.80 57 
048 17 78.81 58 
.069 18 81:59 59 
.097 19 84.13 60 
3 20 86.43 61 
Ра 19 21 88.49 62 
.26 22 90.32 63 
.85 23 91.92 64 
.47 24 93.32 65 
.62 25 94.52 66 
.82 26 95.54 67 
1.07 27 96.41 68 
1.39 28 97.13 69 
1.79 29 97.72 70 
2.28 30 98.21 71 
2.87 31 98.61 72 
3.59 32 98.93 73 
4.46 33 99.18 74. 
5.48 ` 34 99.38 75 
6.68 85 99.53 76 
8.08 36 99.65 77 
9.68 87 99.74. 78 
11.51 38 99.81 79 
13.57 39 99.865 80 
i 15.87 40 99.903 81 
\ 18.41 41 99.981 82 
21.19 42 99.952 83 
к 24.20 43 99.966 84 
27.43 44 99.977 85 
30.85 45 99.984 86 
34.46 46 99.9890 87 
38.21 AT 99.9928 88 
42.07 48 99.9952 89 
46.02 49 99.9968 90 


д, 
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01234656 7 8 9 10 11 12 13 14 15 16 
1 270 218 196 181 170 160 151 144 137 131 125 120 115 110 106 102 97 
2 244 207 189 175 165 156 148 141 134 128 122 118 112 108 104 99 95 
3 228 198 182 170 160 152 144 137 131 125 120 115 110 106 102 97 94 
4 216 191 177 165 156 148 141 134 128 123 118 113 108 104 100 96 92 
5 210 185 172 161 152 145 138 131 126 120 115 111 106 102 98 94 90 
6 199 179 167 157 149 141 135 129 123 118 113 108 104 100 96 92 88 
7 192 174 163 153 145 138 132 126 121 116 111 106 102 98 94 90 86 
8 186 170 159 150 142 135 128 124 118 113 109 104 100 96 92 88 84 
9 181 165 155 147 139 133 126 121 116 111 106 102 98 94 90 86 83 
0 176 161 151 143 136 130 124 119 114 109 104 100 96 92 88 85 81 
1 171 158 148 140 134 127 122 116 111 107 102- 98 94 90 87 83 79 
2 167 154 145 138 131 125 119 114 109 105 100 96 92 89 85 81 78 
3 163 151 142 135 128 122 117 112 107 103 99 94 91 87 83 80 76 

14 159 147 139 132 126 120 115 110 105 101 97 93 89 85 81 78 75 

15 156 144 136 129 123 118 113 108 103 99 95 91 87 83 80 76 73 

16 152 141 134 127 121116 111 106 101 97 03 89 85 82 78 75 71 

17 149 139 131 125119113 109104 99 95 01 87 84 80 77 73 70 
< 18 146 136 129122117111 106 102 98 93 89 86 82 78 75 72 68 

19 143 133 126 120 114 109 105 100 96 92 88 84 80 77 73 70 67 

20 140 131 124118 112107 103 08 94 90 86 82 79 75 72 69 65 

21 137 128121 116110105 101 96 92 88 84 81 77 74 70 67 64 

22 135 126 119113 108103 99 95 90 87 83 79 76 72 69 66 62 

23 132 124 117 111 106101 97 92 89 85 81 78 74 71 67 64 61 

24 130121115 109104100 95 91 87 83 80 76 73 69 66 63 60 

25 127119113 107102 98 93 89 85 82 78 74 71 68 64 61 58 

26 125117111 105101 96 92 88 84 80 76 73 70 66 63.60 57 

27 123 115 109104 99 04 90 86 82 78 75 71 68 65 62 58 55 

28 120113 107 102 97 92 88 84 80 77 73 70 67 63 60 57 54 

29 118111105100 95 91 87 83 79 75 72 68 65 62 59 56 53 

30 116109103 98 93 89 85 81 77 74 70 67 64 60 57 54 51 

31 114107 101 96 92 87 83 79 76 72 69 65 62 59 56 53 50 

32 112105 99 94 90 86 82 78 74 71 67 64 61 58 54 51 48 

33 110103 98 93 88 84 80 76 73 69 66 63 59 56 53 50 47 

34 108101 96 91 86/82 79 75 71 68 64 61 58 55 52 49 46 

35 106 99 94 89 85 81 77 73 70 66 63 60 56 53 50 47, 

36 104 97 92 88 83 80 75 72 68 65 61 58 55 52 49 

37 102 96 91 86 82 78 74 70 67 63 60 57 54 51 
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TABLE Н Mean o-distances from the mean, of various percents of a 


normal distribution 
Average distance from the mean; in terms of б, of each single percentage 
i mitted). Figures along the top of the table 


ОЛОК ЕТЕДА, 
‘al distribution (decimals ог е 
А i «reme. Figures down the side of the 


from the mean of 
0o te 10 in first column). The 


of the next 20% 18 86g (entry opposite 20 in 
column headed 10). The average distance from the mean of the next 80% is 

26 x 20+ (—43 X 10) 
30 


or .13c (20% lie to the right of mean and 10% to left, see page 320). 


A 
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TABLE | A table to infer the value of \/T—F from a given value of r 


v vVi-nu r vi-n r vi-n 
:0000 1.0000 .3400 .9404 .6800 .7332 
01 .9999 .35 .9367 .69 .7238 
102 -9998 36 .9330 :70 7141 
03 -9995 .37 .9290 71 ‚7042 
.04 .9992 .38 .9250 72 :6940 
05 -9987 39 .9208 178 .0834 
.06 .9982 40 9165 74 .6726 
.07 .9975 E .9121 75 6614 
108 .9968 42 .9075 76 6499 
09 .9959 43 9098 - 177 6380 
10 ‚9950 44 .8980 78 ‚6258 
ati -9939 45 .8930 79 6181 
12 .9928 46 .8879 80 -6000 
183 .9915 47 .8827 .81 .5864 
14 -9902 48 8773 82 (5724 
15 .9887 49 8717 .83 15578 
16 .9871 .50 .8660 84 (5426 
pir .9854 .51 18617 .85 .5268 
18 .9837 .52 .8542 .86 510% 
19 9818 153 .8480 .87 4931 
20 .9798 .54 .8417 88 A750 
21 9777 .55 18959 89 4560 
22 .9755 .56 .8285 .90 .4359 
.23 .9732 57 .8216 .91 4146 
54 .9708 .58 .8146 .92 .3919 
.25 -9682 .59 8074 193 8676 
26 .9656 .60 .8000 .94 8412 
27 .9629 .61 .7924 .95 18122 
28 -9600 62 "7846 196 2800 

%:20 9570 63 17166 297 2481 
130 :9539 64 7684 .98 1990 
381 -9507 .65 17599 “99 1411 
82 .9474 -66 -7513 1.00 .0000 
88 9440 .67 7424 


А» 
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TABLE J Coefficients of correlation significant at the 5% level and at 
the 1% level for varying degrees of freedom 


Шу с: Number of Variables E 
Freedom 2 3 4 5 ДЕ; if 9 
1 .997 .999 .999 .999 1.000 | 1.000 | 1.000 
1.000 1.000 1.000 1.000 1.000 1.000 1.000 


2 .950 .975 .983 987 990 992 994 
.990 .995 .997 .998 .998 .998 .999 


) 8 878 .930 .950 961 1968 | .978 
.959 1976 ‘983 | 987 | .990 | .991 | .993 

4 811 881 912 | .930 | .942 | -950 | .961 
.917 ‘949 | .962 | .970 | .976 1979 | .984 
5 754 .836 s74 | вов | .914 | 925 | -%41 
‘937 | 949 | .907 | .963 | ӨТІ. 


.874 .917 
6 -707 795 889 867 886 900 .920 
.834 .886 .911 .927 938 .946 .957 
7 666 .758 807 888 .860 :876 -900 
-798 855 885 | .904 |- .918 .928 .942 
8 .632 .726 777 1811 835 854 880 
.898 .909 .926 


165 827 -860 882 
9 .602 .697 .750 786 812 .832 .861 
735 .800 .836 .861 .818 .891 (911 
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TABLE J—(Continued) 


Degrees Number of Variables 
о! 
Freedom 2 8 4 5 6 7 9 
10 1576 | .671 | .726 | .763 | .790 | .812 | .843 
108 -776 .814 .840 .859 .874 .895 
11 553 | .648 | .703 | .741 | .770 | .792 | .826 
684 | 75 193 | .821 | 1841 | .857 | .880 
12 .532.| .627 | .683 | .722 | .751 | .774 | .809 
661 | .732 | .773 | .802 | 824 | .841 | .866 
13 ОВОЗ | 64 | -708| ‚783 | ЛТ | 204 
641 | .712 | .755 | .785 | .807 | .825 | .802 
14 4497 .590 .646 .686 117 741 719 
623 | .694 | .737 | 1768 | .792 | .810 | .838 
15 482 | .574 | .630 | .670 | .701 | .726 | .765 
.606 .677 291 .752 -776 796 .825 
16 .468 .559 .615 .655 .686 712 751 
-590 | .662 | .т06 | 1738 | 762 | .782 | .813 
17 456 (545 .601 .641 .673 .698 788 
1516 | ‘647 | 691 | i724 | мә | 169 | 1800 
18 М4 | .532 | .587 | .628 | .660 | .686 | .726 
-561 | .633 | .678 -10 | .736 | .756 | .789 
19 433 | .520 | .575 | .615 647 | .674 | 2714 
149 | 1620 | ‘665 | 698 | .723 | 744 | 1718 
20 423 | .509 | .563 | (604 636 | .662 | .708 
-537 | .608 | .662 | .685 | .712 | .733 | .767 
21 413 498 .552 .592 624. .651 .693 
1526 | .596 | 641 | ieza | то | .722 | 756 
22 404 488 .542 .582 614 .640 :682 
-515 | .586 | .630 | .663 | ‘690 | .712 | .746 
23 | .396 | .479 | .532 | 1572 | .604 | .630 | -673 
505 | .574 | .619 | .652 | .679 | .701 | .736 
24 13888 | .470 .523 562 594 621 .663 
496 | .565 | .609 | .642 | 669 | .692 | .727 
25 881 | .462 | .514 553 585 612 | .054 
487 | .556 | .600 | .633 | ‘660 | .682 | .718 
26 374 | .454 | .506 | .545 576 | .603 | .645 
478 | 1646 | ‘590 | 624 | 651 | 673 | 1709 
27 367 | .446 | .498 | „536 568 594 | .637: 
410 | .538 | .682 | .615 | .642 | .664 | .701 
28 .361 .439 .490 529 560 586 .629 
463 | .530 | .573 | (606 1634 | 666 | .692 


bg 


V 
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TABLE J—(Continued) 
Degrees Number of Variables 
о 
Freedom 2 3 4 5 6 7 9 

29 .355 432 482 .521 .552 579 .621 
.456 .522 .565 .598 .625 .648 .685 
30 .349 426 A76 514 545 .571 .614 
.449 .514 .558 .591 .618 .640 .677 
35 825 397 445 .482 .512 .538 .580 
418 A481 523 .556 .582 .605 .642 
40 1304 1378 419 455 484 .509 .551 
.393 454 494 .526 .552 .575 .612 
45 .288 358 .397 482 460 485 526 
372 .430 410 .501 .527 549 .586 
50 .273 .336 .379 412 440 464 .504 
.354 .410 449 479 504 526 .562. 
60 .250 .308 .348 .380 .406 429 467 
325" .377 414 442 .466 .488 .523- 
70 .232 286 324 354 .379 401 438 
.302 .351 .386 .413 436 456 491 
80 217 .269 304 332 356 377 413 
.283 .330 .362 .389 411 431 464 
90 .205 .254 .288 315 338 358 392 
267 .312 .343 368 .390 409 441 
100 195 241 274 .300 .322 341 874 
1254 .297 .327 .351 372 .390 421 
125 174 216 246 .269 .290 307 1398 
.228 .266 .294 .316 .335 .352 .381 
150 1159 .198 .225 247. 266 282 1310 
.208 .244 .270 .290 .308 .322 .351 
200 138 172 .196 .215 231 246 1271 
181 .212 .234 .253 .269 .283 .307 
300 118 141 160 176 .190 .202 .223 
.148 174 192 .208 .221 .233 .253 
400 1098 1122 139 1158 165 ‚176 194 
128 151 167 180 192 .202 220 
500 .088 .109 1124 137 148 157 :174. 
.115 135 150 162 172 182 198 
1000 0621 17:077. | 2088 7 097 405 | .112 | 1% 
.081 .096 .106 115 122 .129 1141 


TABLE OF SQUARES AND SQUARE ROOTS 
OF THE NUMBERS FROM 1 ТО 1000 
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TABLE оғ SQUARES AND Square ROOTS оғ THE 
Square Root ` 


Number 


ODNO сласы 


10 


Square 


196 


OAR ооо w 


а SEPSA RISPE SLEA N 


© ocOo--0 o 


1.000 
1.414 
1.732 
2.000 


9ooo ooooo 
со 


мәәә o 
oO 
388 


Number 


NUMBERS From 1 то 1000 
Square Root 
.141 


7 
T 
7. 
7 
7 


Square 
26 01 


8. 
8. 


7 
7 
7 
7 
2 
7» 
"i 
7 
8 
8 
8 
8. 
8. 
8. 
8 


8 
8. 
8. 
8. 
8 


8 
8 
8 
9 
9 
9 
9 
9 
9 
9 
9. 
9. 
9 
9 
9 
9 
9 
9 
9 
9 
9 
9 
10 


VM P. 
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` 
13 
| None Y оғ Бараққа AND Square Roots—Continued 
quare аге Root Number Square Square Боо! 
101 10201 10.050 151 22801 12.288 4 
102 10404 10.100 152 23104 12.329 
| 103 106 09 10.149 153 23409 12.369 
104 108 16 10.198 154 23716 12.410 
105 11025 10.247 155 24025 12.450 ` 
106 11236 10.296 156 4: 
107 11449 10.344 157 . 246 E НАТЫН 
| 108 11664 10.392 158 24964 12.570 
109 11881 10.440 159 25281 12.610 
110 12100 10.488 160 2 56 00 12.649 
111 12321 10.536 161 25921 12.68 
Р, 112 12544 10.583 162 26244 12:028 
д 113 127,69 10.630 163 2 65 69 12.767 
| 114 12996 10.677 164 2 68 96 12.806 
| 115 13225 10.724 165 27225 12.845 
116 13456 10.770 166 27556 12. 
| 117 13680: 10.817 167 27889 12:923 " 
118 13924 10.863 168 2 82 24 12.961 
119 14161 10.909 169 2.85 61 13.000 
120 14400 10.954 170 28900 13.038 
| 121 14641 11.000 171 29241 13.077 
| 122 14884 11.045 172 29584 13.115 
123 15129 11.091 173 2 9929 13.153 
124 15376 11.136 174 30276 13.191 
125 15625 11.180 175 3 06 25 13.229 
d 126 15876 11.225 176 30976 13.266 . 
А 127 16129 11.269 177 3 13 29 13.304 
128 163 84 11.314 178 3 16 84 13.342 
129 166 41 11.358 179 32041 13.379 
130 169 00 11.402 180 3 24 00 13.416 
131 171 61 11.446 181 32761 13.454 
132 17424 11.489 182 33124 13.491 
| 133 17689 11.533 183 33489 13.528 
| 134 179 56 11.576 184 33856 13.565 
135 18225 11.619 185 34225 13.601 
| 136 18496 11.662 186 3 45 96 13.638 
137 187 69 11.705 187 3 49 69 13.675 
138 190 44 11.747 188 35344 13.711 
139 19321 11.790 189 35721 13.748 
140 1 96 00 11.832 190 3 61 00 13.784 
141 198 81 11.874 191 36481 13.820 
142 201 64 11.916 192 3 68 64 13.856 
143 20449 11.958 193 37249 18.892 
144 20736 12.000 194 37636 13.928 
145 21025 12.042 195 38025 13.964 
146 213 16 12.083 196 38416 14.000 
147 2 16 09 12.124 197 3 88 09 14.036 
148 21904 12.166 198 39204 14.071 
149 22201 12.207 199 39601 14.107 
| 150 22500 12.247. 200 4 00 00 14.142 
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TABLE оғ SQUARES AND Square Roors—Continued 
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AN 
X TABLE OF SQUARES A 
, ND SQUARE ROOTS > 445 
| 'ABLE OF SQUARES AND с 
| Poi Square Square ao Бары Боот Шы 
90601 ‘umber Squar 
йори no ms un Em 
9 172407 2% 12009 
304 92416 Р E m 18.788 
305 9 30 25 1204 $5 1 dE. 
У 12 60 En 
306 936 36 17.493 356 % к 
| sons no ш Е 550 
309 95481 17:578 358 128104 sn 
309 x 359 18.921 
96100 17.607 359 , 12256 Аи 
J 311 96721 17 635 apt M 
= 8% 97344 17664 бі Bon n» 
s 909 п 692 $2 131769 19:020 
85 96 ШЕН 
2 и 17.720 364 13 24 96 19-053 
17.748 365 13 32 25 19:108 
316 9 98 56 17.776 e 
819 гаво 17.06 366 133956 19.131 
2 10 04 88 17.804 367 134689 19.157 
319 10 17 61 17.861 368 1358 19.183 
320 10 24 00 17.889 370 1860 60 bon 
321 10 30 41 17.916 uc 
322 10 36 84 177944 372 131051 19.281 
| 323 10 43 29 17:972 572 135129 BEN 
| 224 10 49 76 18.000 E 13 95 76 ee 
А 5 10 56 25 18.028 375 i o 35 19-393 
; Ў 
f 326 10 62 76 18.C 
.C55 
en 10 69 29 18.083 277 i4 A 19 53% 
329 10 gau 18: 138 28 изба m 
330 10 89 00 18.166 350 M % E 1914 
1494 
331 1095 61 1 
332 11 02 24 18 385 145161 18-018 
| 33 11 08 89 18.248 ES 19.515 
| CEA 56 18.276 384 110456 19:800 
| 112225 18.303 385 14 82 28 19:621 
336 1128 96 
337 1135 69 18.338 387 1487 69 19:672 
338 114244 18.385 388 11970 19:608 
X 339 114921 18.412 389 55 же 19-698 
i 115600 18.499 390 152100 19:548 
341 116281 18.466 
342 11 69 64 18.493 392 5 2% ba TAG 
E 43 11 76 49 18.520 393 1 TA 
344 118336 18.547 55% 19:849 
345 119025 18.574 EE 5 E 28 19.219 
1875 
. 346 119716 18 
е шии кш Б seg gm 
48 121104 18.655 398 H 604 192022 
349 121801 18.682 399 15% ot TUE 
1975 
400 16 00 00 20.000 


350 12 25 00 18.708 
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TanLE or SQUARES AND SQUARE Roots—Continued 
Square Square Root 


16 08 01 
16 16 04 
16 24 09 
1632 16 
16 40 25 


16 48 36 


16 89 21 
16 97 44 
17 05 69 
17 13 96 
17 2225 


17 30 56 
17 38 89 
17 47 24 
17 55 61 
17 6400 


177241 


18 1476 
18 23 29 
183184 
184041 
184900 


185761 
186624 
187489 
18 83 56 
18 92 25 


19 00 96 
19 09 69 
19 18 44 
192721 
1936 00 


194181 
19 53 64 
19 62 49 
19 7136 
19 80 25 


19 89 16 
19 98 09 
20 07 04 
20 16 01 
20 25 00 


451 


Number Bquare 


203401 
20 43 04 
20 52 09 
20 6116 
20 70 25 


20 79 36 
20 88 49 
20 97 64 
210681 
211600 


212521 
213444 
21 43 69 
215296 
216225 


217156 
218089 
219024 
219961 
22 09 00 


221841 


231361 
232324 
23 3289 
23 4256 
23 5225 


23 6196 
237169 
238144 
23 9121 
24 01 00 


24 10 81 
24 20 64 
24 30 49 
24 40 36 
24 5025 


Square Root 


21.237 
21.200 
21.284 
21.307 
21.831 


21.354 
21.378 
21.401 
21.424 
21.448 


21.471 
21.494 


TABLE or Squares AND SQUARE R 


Square 


251001 
25 2004 
25 30 09 
254016 
25 5025 


25 60 36 
25 70 49 
25 80 64 
25 90 81 
26 01 00 


26 1121 
26 21 44 
26 31 69 
26 41 96 
26 5225 


26 6256 
26 7289 
26 8324 
26 93 61 
27 0400 


271441 
272484 
273529 
27 4576 
27 56 25 


27 6676 
27 77 29 
27 87 84 
27 98 41 
28 09 00 


28 19 61 
28 30 24 
28 40 89 
28 51 56 
28 62 25 


28 72 96 
28 83 69 
28 94 44 
29 05 21 
29 16 00 


29 26 81 
29 37 64 
29 48 49 
29 59 36 
29 70 25 


29 81 16 
29 92 09 
30 03 04 
301401 
302500 


TABLE OF SQUARES AND SQUARE ROOTS * 447 


Square Root 


ooTs—Continued 
Number Square Square Root 

551 30 36 01 23.473 
552 30 47 04 23.495 
553 30 58 09 23.516 
554 30 69 16 23.537 
555 30 80 25 23.558 
556 30 9136 23.580 
557 31 02 49 23.601 
558 311364 23.622 
559 312481 23.643 
560 3136 00 23.664 
561 3147 21 : 
562 31 58 44 23.707 
563 31 69 69 23.728 
564 31 80 96 23.749 
565 319225 23.770 
566 320356 23.791 
567 321489 23.812 
568 322624 23.833 
569 323761 23.854 
570 32 49 00 23.875 
571 32 60 41 23.896 
572 327184 23.917 
573 32 83 29 23.937 
574 329476 23.958 
575 33 06 25 23.979 
576 33 17 76 .000 
577 332929 24.021 ` 
578 33 40 84 24.042 
519 33 5241 24.062 
580 33 64 24.083 
581 33 75 61 24.104 
582 33 87 24 24.125 
583 33 98 89 24.145 
584 34 10 56 24.166 
585 3422 24.187 
586 343396 24.207 
587 34 45 69 24.228 
588 34 5744 24.249 
589 34 69 21 24.269 
590 3481 24.290 
591 349281 24.310 
592 35 04 64 24.331 
593 35 16 49 24.352 
B94 352836 24.372 
595 35 40 25 24.393 
596 35 52 16 24.413 
597 35 64 09 24.434 
598 35 76 04 24.454 
599 35 88 01 24.474 
600 36 00 00 24.495 
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TABLE оғ SQUARES AND SQUARE Roors—Continued ' 


Number Square Square Root Number Square Square Root 
601 36 12 01 24.515 651 423801 25.515 
602 36 24 04 24.536 652 42 51 04 25.534 
603 36 36 09 24.556 653 42 64 09 25.554 
604 36 48 16 24.576 654 427716 25.573 
605 36 60 25 24.597 655 429025 25.593 
606 367236 24.617 656 43 03 36 25.612 
607 205449 24.637 657 43 16 49 25.632 
608 24.658 658 43 29 64 25.652 
609 37 08 81 24.678 659 43 42 81 25.671 
610 37 21 00 24.698 660 43 56 00. 25.690 
611 873321 24.718 661 43 69 21 25.710 
612 37 45 44 24.739 662 43 82 44 25.729 
613 37 57 69 24.759 663 43 95 69 25.749 
614 37 69 96 24.779 664 44 08 96 25.768 
615 37 82 25 24.799 665 44 22 25 25.788 
616 37 94 56 24.819 666 44 35.56 25.807 
617 38 06 89 24.839 667 44 48 89 25.826 
618 38 19 24 24.860 668 44 62 24 25.846 
619 383161 24.880 669 44 75 61 25.865 
620 38 44 00 24.900 670 44 89 00 25.884 
621 38 56 41 24.920 671 45 02 41 25.904 
622 38 68 84 24.940 672 45 o 84 25.923 
623 388129 24.960 673 45 29 29 25,942 
624 38 93 76 24.980 674 454276 25.962 
625 89 06 25 25.000 675 45 56 25 25.981 
626 39 18 76 25.020 676 45 69 76 26.000 
627 39 31 29 25.040 677 45 83 29 26.019 
628 39 43 84 25.060 678 45 96 84 26.038 
629 39 56 41 25.080 679 46 10 41 26.058 
630 39 69 00 25.100 680 46 24 00 26.077 
631 398161 25.120 681 463761 26.096 
632 39 94 24 25.140 682 46 51 24 26.115 
633 40 06 89-. 25.159 683 46 64 89 26.134 
634 40 19 56 25.179 684 46 78 56 26.153 
635 40 32 25 25.199 685 46 92 25 26.173 
636 40 44 96 25.219 686 47 05 96 26.192 
637 40 57 69 25.239 687 47 19 69 26.211 
638 40 70 44 25.259 688 47 33 44 26.230 
639 40 83 21 25.278 689 474721 26.249 
640 40 96 00 25.208 690 47 61 00 26.208 
641 4108 81 25.318 691 47 74 81 26.287 
642 412164 25.338 692 47 88 64 26.306 
643 413449 25.357 693 48 02 49 26.325 
et 41 47 36 25.377 694 48 16 36 26.344 

41 60 25 25.397 695 48 30 25 26.363 
646 4173 16 25.417 696 1 26.382 
647 41 86 09 25.436 697 48 E: oo 26.401 
648 41 99 04 25.456 698 487204 26.420. 
649 42 12 01 25.475 699 48 86 01 26.439 
650 42 26 00 25.405 700 49 00 00 26.458 
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TABLE OF SQUARES AND SQUARE Roors—Continued 


1 Number Square Square Root Number Square Square Root 
701 491401 26.476 751 56 40 01 27.404 
702 49 28 04 26.495 752 56 55 04 27.423 
703 49 42 09 26.514 753 567009 27.441 
704 495616 26.533 Jea 568510 27.459 
705 497025 26.552 755 570025 27.471 
706 40 84 36 26.571 756 571536 27.495 
707 49 98 49 26.589 757 57 3049 27.514 
708 501264 26.608 758 57 45 64 27.532 
709 50 26 81 26.627 759 57 60 81 27.550 
По 504100 26-646 Too 577600 27.568 
711 505521 26.665 761 579121 27.586 
A z] 712 50 69 44 26.683 762 58 06 44 27.604 
a SS 713 50 83 69 26.702 763 582169 27.622 
AT! її 509790 26.721 Ti 583696 27.04 
| 715 511295 26.799 Тб 585225 27.659 
| тв 512658 26-758 76 586750 27.677 
717 51 40 89 26.777 767 58 8289 27.695 
| 718 515524 26.796 768 589824 27.713 
719 51 69 61 26.814 769 59 13 61 27.731 
| 719 50184200 ` 26.833 7% 5902900 27.749 
721 51 98 41 26.851 771 59 44 41 27.707 
Э 722 521284 26.870 772 Б9 59 84 27.785 
723 52 27 29 26.889 113 59 75 29 27.803 
724 524176 26.907 774 599076 27.821 
72% = 525625 26.926 75 062% 27.830 
720 527076 26-94 те 602176 127907 
76 528529 26.903 тт 603729 2 E 
7 529961 26.981 778 6052 { 
729 531441 7.000 T$ 00841 2790 
730 53 29 00 27.019 780 60 84 00 27.928 
ті 534361 27-031 ті 0961 27.046 
732 53 58 24 27.055 782 611524 27.964 
733 72 89 27.074 783 13089 27.982 
734 53 87 56 27.092 784 61 46 56 28. 
735 54 02 25 27.111 785 616225 28.018 
736 — 541096 27-129 qua | 617190 12500 
737 543169 27.148 787 61 93 69 28 n 
138 5446 21.166 788 6209 28 
739 БА 61 21 27.185 789 62 25 21 28.089 
к 19 5647600 27.203 790 624100 28.1 
р та . 549081 27221 т 1625451 28015 
742 59062 27.240 792 627204 2. 142 
742 852049 27-258 793 6288 С 
73 253536 27-216 794 630236 28 178 
TAS 5025 27.205 795 632 5 
т.313 тв 633016 78.21 
26 55 5 rs 27-331 797 63 52 09 CREE 
| 748 559504 27.350 798 63 68 04 28.249 
799 63 84 01 28.207 
| 749 56 10 01 27.368 
1 750 56 25 27.386 64 00 00 28. 


450 • STATISTICS IN PSYCHOLOGY AND EDUCATION 


Number Bquare Square Root, Number Square Square Root, 
801 641601 28.302 851 724201 29.172 
802 643204 28.320 852 72 59 04 29.189 
805 6448 09 28.337 853 72 76 09 29.206 
804 646416 28.355 854 72 93 16 29.223 
805 64 80 25 28.373 855 73 1025 29.240 
806 64 96 36 28.390 856 73 27 36 29.257 
807 65 12 49 28.408 857 73 44 49 29.275 
808 65 28 64 28.425 858 73 6164 29.292 
809 654481 28.443 859 737881 29.309 
810 65 61 00 28.460 860 73 96 00 29.326 
811 65 77 21 28.478 861 741321 29.343 
812 65 93 44 28.496 862 74 30 44 29.360 
813 66 09 69 28.513 863 74 47 69 29.377 
814 66 25 96 28.531 864 74 64 96 29,394 
815 66 42 25 28.548 865 74 82 25 29.411 
816 66 58 56 28.566 866 74 99 56 29.428 
817 66 74 89 28.583 867 75 16 89 29.445 
818 66 91 24 28.601 868 7534 24 29.462 
819 67 07 61 28.618 869 755161 29.479 
820 67 24 00 28.636 870 75 69 00 29.496 
821 67 40 41 28.653 871 75 86 41 29.513 
822 67 56 84 28.671 872 76 03 84 29.530 
823 67 73 29 28.688 873 76 2129 29.547 
824 67 8976 28.705 874 76 38 76 29.563 
825 68 06 25 28.723 875 76 56 25 29.580 
826 68 2276 28.740 876 76 73 76 29.597 
827 68 39 29 28.758 877 76 9129 29.614 
828 68 55 28.775 878 77 08 84 29.631 
829 68 7241 28.792 879 77 26 41 29.648 
830 68 89 00 28.810 880 77 44 00 29.605 
831 $9 05 61 28.827 881 77 61 61 29.682 
832 69 22 24 d 882 77 79 24 29.698 
833 69 38 89 28.862 883 77 96 89 29.715 
934 69 55 56 28.879 884 78 14 56 29.732 
835 69 72 25 28.896 885 78 32 25 29.749 
836 69 88 96 28.914 886 78 49 96 29.766 
EM 70 05 69 28.931 887 78 67 69 29.783 
83 70 2244 28.948 888 78 8544 29.799 
839 70 39 21 28.965 889 79 03 21 29.816 
840 70 56 00 28.983 890 792100 29.833 
841 70 72 81 29.000 891 79 38 81 29.850 
842 70 89 64 29.017 892 79 56 64 29.866 
843 71 06 49 29.034 893 79 73 49 29.883 
844 712336 29.052 894 79 92 36 29.900 
845 714025 29.069 895 801025 29.916 
846 715716 29.086 896 802816 29.933 
847 717409 29.103 897 80 46 09 29.950 
848 71 91 04 29.120 898 80 3404 29.967 
849 720801 29.138 899 80 8201 29.983 
850 72 25 00 29.155 900 81 00 00 30.000 


Qo 
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TABLE OF SQUARES AND SQUARE Roots—Continued 


Number Square Square Root Number Square Square Root 
! 901 811801 30.017 951 90 44 01 30.838 
902 813604 30.033 952 90 63 04 30.854 
903 815409 30.050 953 90 82 09 30.871 
904 817216 30.067 954 910116 30.887 
905 819025 30.083 955 912025 30.903 
| 906 82 08 36 30.100 956 91 39 36 30.919 
| 907 822649 30.116 957 9158 49 30.935 
908 82 44 64 30.133 958 9177 64 30.952 
909 82 62 81 30.150 959 91 96 81 30.968 
910 82 81 00 30.166 960 92 16 00 30.984 
911 829921 30.183 961 92 35 21 31.000 
912 83 17 44 30.199 962 92 54 44 31.016 
913 83 35 69 30.216 963 92 73 69 31.032 
914 83 53 96 30.232 964 92 92 96 31.048 
915 83 72 25 30.249 965 931225 31.064 
916- 83 90 56 30.265 966 933156 31.081 
917 84 08 89 30.282 967 93 50 89 31,097 
918 84 27 24 30.299 968 93 70 24 31.113 
919 84 45 61 30.315 969 93 89 61 31.129 
920 84 64 00 30.332 970 94 09 00 31.145 
921 848241 30.348 971 942841 31.161 
922 85 00 84 30.364 972 94 47 84 31.177 
923 85 19 29 30.381 973 94 67 29 31.193 
924 85 37 76 30.397 974 94 86 76 31.209 


975 95 06 25 31.225 
976 95 25 76 31.241 


926 85 74 76 30.430 

927 85 93 29 30.447 977 95 45 20 31.257 

928 86 1184 20.463 978 95 64 84 31.273 

929 86 30 41 30.480 979 95 84 41 31.289 
930 86 49 00 30.496 980 96 04 00 31.305 
931 86 67 61 30.512 981 96 23 61 31.321 
932 86 86 24 30.529 982 96 43 24 31.337 
933 87 04 89 30.545 983 96 62 89 31.353 
934 87 23 56 30.561 984 96 82 56 31.369 
935 87 42 25 30.578 985 97 02 25 31.385 
936 7 60 96 30.594 986 97 21 96 31.401 
937 87 $9 69 30.610 987 97 41 69 31.417 
938 87 98 44 30.627 988 97 6144 31.432 
939 88 17 21 30.643 989 97 81 21 31.448 
940 88 36 00 30.659 990 98 01 00 31.464 
941 1 30.676 991 98 20 81 31.480 
942 55 % бі 30.692 992 98 40 64 31.496 
943 889249 30.708 993 98 60 49 31.512 
944 89 11 36 30.725 994 98 80 36 31.528 
945 89 30 25 30.741 995 99 00 25 31.544 
946 30.757 996 992016 31.559 
947 55 8 % 30.773 997 99 40 09 31.575 
948 89 87 04 30.790 998 99 60 04 31.591 

1 949 90 06 01 30.806 999 99 80 01 31.607 
950 90 25 00 30.822 1000 1000000 31.623 
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INDEX 


Accuracy, standards of, in computa- 
tion, 20-24 

Ackerson, L., 340 

‘Actuarial prediction, through correla- 
tion, 164 

Adkins, D. C., 309, 351 

Analysis of variance: principles of, 
208-273; how variances are ana- 
lyzed, 269-273; in determining sig- 
nificance of difference between inde- 
pendent means, 273-284; between 
correlated means, 285-296 

Anastasi, А., 346, 400 

Anderson, J. E., 350 

Arkin, Н., 454 

Array, in à correlation table, 130 

Attenuation: correction of correlation 
coefficient for, 346-947; assumptions 
underlying, 347 

Average: definition of, 28; of correla- 
tion coefficients, 146-147. See also 

Mean, Median, and Mode. 


Bar diagram, 80-82 

Barlow’s Tables, 454 

Beta coefficients: in partial and multi- 
ple correlation, 393-394, 396-397 ; as 
“weights,” 393; calculation of, in 
Wherry-Doolittle method, 415-417 

pias in sampling. See Sampling 

Binomial expansion: use in probabil- 
ity, 87-92; graphic representation 
of, 91 

Bi-serial correlation, 356-362; calcu- 
lation of тыз, 357-359; SE of Tots, 
359; alternate formula for, 360-361; 
point bi-serial coefficient, 361-362 

Brigham, С. С., 214 

Burks, B. S., 420 


Central tendency, measures of, 28. 
See also Mean, Median, and Mode 


Chesire, L., 365 
Chi-square test, 954; as a measure of 


divergence from the null hypothesis, 
955-957, and from the normal dis- 
tribution, 957-258; when table en- 
tries are small, 258-261; when table 
ies are in percentages, 261-262; 
in contingency tables, 262-265; ad- 
ditive property of, 265 
Classification of measures into a Íre- 
quency distribution, 4-9 
Class-interval: definition of, 
methods of expressing, 7-8; 
point of, 7-8; limits of, 17-8 
Clayton, B., 340 
Coefficient: of variation, Or y, 57-60; 
of alienation, 174-176; of determina- 
tion, in the interpretation of т, 
176-178 
Coefficient of correlation: meaning of, 
126-134; ав а ratio, 126-128; repre- 
sented graphically, 131; computa- 
tion of, deviations from assume! 
134-139; computation of, 
from means, 139-142; 
computation of, deviations from 
4 averaging of, 146- 
147; effect of variability upon, 166- 
167; interpretations of, 172-178; re- 
liability of, 197-201 


Colton, R., 45 
See Histogram 


Column diagram. t 
Comparison : of obtained distribution 


with normal probability curve, 101- 
103; of groups in terms of overlap- 
ping, 107-108. See also. Chi-square, 
Skewness, and Kurtosis 
Computation, rules for, 20-24 
Confidence-intervals for the 
mean, meaning of, 187-189: 
Conrad, H. S., 176 
Contingency, coefficient of (C), 368- 
371; relation of C to chi-square, 
368; methods of computing C, 3 
370; comparison of C with r, 371 
Continuous series: definition of, 2-3; 


455 


true 
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Scores in, 3-4; tabulation of meas- 
ures in, 4-9 

Correlation, linear, 122, 131-134; posi- 
tive, negative, and zero, 122-191; 
expressed as a ratio, 126-127; con- 
struction of table, 198-130; graphic 
representation of, 131-134; product- 
moment method in, 134-139; from 
ungrouped data, 139-146; difference 
formula in, 145-146; effect of errors 
of observation upon, 346-347; rank 
difference method of computing, 
353-356 ; spurious, 399-401. See also 
Partial correlation and Multiple 
correlation 

Correlation-ratio (ela), in non-linear 
relationship, 372 

Covariance, analysis of, 289-295 

Criterion: value of, in determining 
the validity of tests, 345-346; pre- 
diction of by multiple regression 
equation, 391-394 

Critieal ratio, definition of, 215. See 
also t-test 

Cumulative frequencies, method of 
computing, 63-64 

Cumulative frequency graph: con- 
struction of, 63-65; smoothing of, 
76-77 

Cureton, E. E., 389 

Curvilinear relationship, 371-373 


Data, continuous and discrete, 2-4 

Davis, F. B., 351 

Deciles. See Percentiles 

Degrees of freedom: meaning of, 193- 
194; in analysis of variance, 278, 
283, 287 

Deviation. See Quartile deviation, 
Mean deviation, and Standard devi- 
ation 

Differences, significanee of: between 
means, 213-232; between medians, 
232; between standard deviations, 
232-236; between percentages, 236— 
239; between r’s, 239-940. Sec also 
Standard error and Probable error 

Discrete series, 2 

Distribution, frequency. See Fre- 
quency distribution 

Dunlap, J. W., 196, 389 


Edgerton, H. A., 161 
Edwards, A. L., 268, 299, 453 


Elliott, R. M., 405 

Equivalent groups, method of, 228- 
230 

Error, curve of, 85-87. See also Nor- 
mal curve 

Errors: of sampling, 201-208; con- 
stant, 209. See also Probable and 
Standard errors А : 

Experimental hypotheses: testing of, 
247-254; null hypothesis, 247-248 

Ezekiel, M., 389, 418 


Ferguson, С. А., 350 

Fertig, J. W., 298 А 

Fiduciary limits (Fisher), 189; prob- 
ability, 1 

Fisher, 5 189, 198, 203, 249, 270, 
428, 453, 454 

Flanagan, J. C., 351 

Franzen, R., 59 . "T 

Frequency distribution: construc 2 
of, 4-8; graphical representation 0}, 
9-20; normalizing a, 307-311; rec 
tangular and normal, и" 

Frequency polygon: construc im 45; 
11-12; smoothing of, 14-16; a 

à histogram, 18, 20; 


ion of, 


parison with E 
comparison of two, on same axe, 
18, 19 


Froelich, G. J., 336 x 
F-test: in comparing two 68, E 
234; in analysis of variance, 
281 


Garrett, Н. E., 186, 280, 400 
Goulden, C. H., 270 node f 
Graphic representation: Te а" 
9-10; of correlation coefficie Pri 
See also Frequency polygon, nek 
gram, Cumulative frequency d 
Percentile curve or Овіуе, 
raph, Bar diagram. y 
або; in tabulating а mum 
distribution, 4-9; assumpti 
8-9 
Guilford, 7. P., 351, m 453 
Gulliksen, H., 348, 95 


Hartshorne, H., 236 

Hawkes, Lindquist, 
= : orre- 

Heterogencity, effect of: ue dr ee 
lation, 166-167; upon the reli 
coefficient, 344-845 


and Mann, 115, 


Hillegas, M. B., 317 

Histogram: definition of, 16-17; com- 
parison of, with frequency polygon, 
18, 20 

Holtzman, W. H., 189, 224 

Holtzinger, K. J., 340 

Homogeneity, 43; effect of, upon cor- 
relation, 166-167 

Hull, C. L., 117, 324 


Inferences, errors in, 219-222 

Interaction, in analysis of variance, 
287 

Interval. See Class-interval 

Item analysis: problem of, 349; and 
selection, 349; and difficulty of, 350; 
and validity, 350-351 


Jackson, J. D., 340 
Johnson, P. O., 241, 453 
Jones, D. C., 94 

Jones, H. E., 173 
Jones, L. V., 217 


Kelley, T. L., 99, 342, 368 

Kelly, Е. L., 840 

Kendall, M. G., 371, 395 

Kuder, G. F., 335 

Kurtosis: calculation of, 
standard error of, 242-243 

Kurtz, A. K., 196 


100-101; 


Levels of confidence, 186-187 
Lewis, D., 190, 254 

Likert, R., 319 

Lindquist, E. F., 270, 453 
Line graphs, 78-80 

Long, J. A., 351, 361 


Martin, G. B., 176 
Matched groups, method of, 
May, M. A., 286, 380 
McCall, W. A., 308 
MeNemar, Q., 85, 94, 219, 238, 268, 
294, 343, 453 


Mean, arithmetic: 
from ungrouped scores, 28, from 


frequency distribution, 29-31, by 
"assumed mean” method, 36-39; 
when to use, 39; reliability of, 182- 
185; limits of accuracy for, 
187 

Mean deviation, 
of, from ungrouped data, 


230-233 


calculation of, 


or MD: calculation 
48-49; 
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from grouped data, 49-50; when to 
use, 61 

Median: calculation of, from un- 
grouped scores, 31-32; from fre- 
quency distribution, 32-34; in spe- 
cial cases, 34-35; when to use, 39; 
reliability of, 194 Б 

Merrill, М. A., 170, 343 

Method: single group, 225-225; equiv- 
alent groups, 228-230; matched 
groups, 230-232 

Midpoint of interval, as representative 
of all of the scores on the interval 
7-8 i 

Mode: calculation of, 35-36; when to 
use, 40 

Mode, E. B., 453 

Moore, T. V., 420 

Morgan, J. J. B., 248 

Moving average, use of in smoothing 
a curve, 14-16 

Multiple coefficient of correlation, R 
380; computation of, in a three- 
variable problem, 387; formulas for, 
395-397; beta coefficients in, 396- 
397; significance of, 397; “shrink- 
age” in, 407; limitations to use of, 
419-420 

Multiple regression equations: for n 
variables, 391; for three variables 
(special form), 391-393; partial re- 
gression coefficients (b), 392-393; 
beta coefficients, 393-394 


Non-linear relationship, measurement 
of, 371-373 

Normal probability curve, 85-87; il- 
lustrations of, 85-86; deduction 
from binomial: expansion, 90-92; in 
psychological measurement, 92-94; 
equation of, 94; properties of, 94- 
96; constants of, 94, 97; comparison 
of obtained distribution with, 101- 
103; use in solution of a variety of 
problems, 103-113; in scaling test 
scores, 323-326; jn sealing judg- 
ments, 326-327 

Normality: divergence of frequency 
distribution from, 113-118; normal- 
izing a frequency distribution, 307- 
311; T-scores, 307-313 pa | 

Null hypothesis: іп determining Sig- 
nificance of coefficient of correlation, 
199-201; in testing reliability of dif- 
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ferences, 213; advantages of, 246- 
247; testing of, against direct deter- 
mination of probable outcomes, 
248-251; testing of, against normal 
curve frequencies, 251-253 

Numbers: rounded, 20-21; exact and 
approximate, 22 


Ogburn, W. F., 420 

Ogive: construction of, 69-71; per- 
centiles and percentile ranks from, 
70-75; uses of, 73-75; smoothing of, 
76-77 

Order of merit, 
changing into 
table for, 324 

Otis, A. S., 334 

Overlapping, in the measurement of 
groups, 107-108 


ranks, 
numerical 


323-327; 
Scores, 


Parallel forms method, in reliability 
of test scores, 333-334 

Parameter, definition of, 181 

Partial correlation: value of, in analy- 
sis, 378-379; illustrations of, in a 
three-variable problem, 380-387 ; по- 
tation in, 387-388; formulas for par- 
tial r's, 387-389; significance of, 389; 
limitations to the use of, 419-420 

Paterson, D. G., 315, 405 

Pearson, K., 128 

Percentages: standard error of, 196; 
standard error of the difference be- 
tween two, 237-238 

Percentile, ranks (PR): computation 
of, 68-69; construction of curve of, 
69-73; graphic method of finding 
ranks, 71-73; uses of curve of, 73- 
78; norms, 75-77; scale, use of, in 
combining test scores, 313-315; 
scale, disadvantages of, 315 

Percentiles: calculation of, 66-69; 
graphie method of finding, 70-74 

Perry, N. C., 368 


Peters, C. C., 168, 365, 371 

Phi-coefficient, calculation of, 367- 
368; relation to 72, 368 

Pintner, R., 315, 400 

Predictions: accuracy of, from regres- 
sion equations, 161-163; accuracy of 
group, 163-166; “regression effect” 
in, 171-172; from multiple regres- 
sion equations, 386-387, 394-395 


Probability, elementary principles of, 
87-92 

Probable error: relation to (0, 47; re- 
lation to c, 97 . 

Product-moment method of finding 7, 
134-139 


Quartile deviation (0): calculation of, 
44-48; when to use, 61; reliability 
of, 195 4 

Quartiles, 0, and Qs, computation of, 
44-48 


variability; 
influence 
lation, 


Range, as a measure of 
44; when to use, 60-61; 
upon the. coefficient of corre 
166-167 б 

Rank-difference method of computing 
correlations, 354-356 ; when to use, 
356 É i 

Ranks, transmutation of, into units of 
amount, 323-327 A 

Rational equivalence, method. of, in 
test reliability, 335-337 

Rectangular distribution, ат! 
313-315 жал 

гі 154-156; in 

77 correlation, 
392-394 for 

Regression effect, reasons 101, 
172 р Y 

Regression equations, 151-154; 10 va 
viation form, 154-157; іп corre 
tion table, 157-158; in score formi 
159-160; value of, in prediction and 
control, 160-161; limitations to use 
of, 162-166; formulas for, in parai 
and multiple correlation, Dg qud 

Relative variability, coefficient of, T 
60. See also Coefficient of vari- 
ation 

Reliability: meaning of, 180-183; of 
the mean, 182-185; in small rA 
ples, 189-193; of the median, 194; 
of Q, 195; of c, 195; of a percentage, 
196-197; sampling and reliability 
201-209; of differences, independen! 
means, 213-216, 222-225; of diie 
ences, correlated means, 225-232; Ў 
test scores, 332-344; index of, wo 
342; dependence of coefficient d 
upon the size and variability of 
group, 344 

Remmers, H. H., 340 


d normal, 


coefficient, 
and multiple 


171- 


Rhine, J. В., 248 

Richardson, M. W., 335, 351, 361 
Rider, P. К., 453 t 
Ruch, G. M., 340 

Russell, J. T., 165 


Saffir, M., 365 

Sampling: random, 202-205; strati- 
fied, 205-206; incidental, 206; pur- 
posive, 207; size of, 207-208; and 
errors of measurement, 208; bias 
and constant errors in, 209 

Sandiford, P., 351, 361 

Scale, definition of, 1 

Scaling: of -test items (o-scaling), 
301-305; of total scores, 305-307 ; of 
judgments, 316-318; of answers to 
а questionnaire, 319- 2; of ratings, 
322-323. See also Percentile scale, 
T-scale 

Scatter diagram, 128-129 

Scores: definition of, 1; in continuous 
and in discrete series, 2-8 

Selection of tests in a battery, factors 
in, 397-399 

Semi-interquartile range, 44-48. See 
also Quartile deviation 

Shartle, C. L., 161, 174, 345, 404 

Shock, N. W., 340 

Sigma scores, and 
305-307 

Significance: 
of, 216-217; 
tests of, 217-219; 
mining, 427; .05 and 01 
for т, 437-439 

Significant figures, 21 

Skewness: measurement of, 97-99; 
causes of, 114-118; standard error of 
measure of, 241-242 

Snedecor, G. W., 193, 270, 453 

Spearman-Brown prophecy formula in 
test reliability, 339-341 

Split-half method, in reliability of test. 
scores, 334-335 

Spurious correlation, 399; arising from 
heterogeneity, 399-400; of indices, 
400—401 ; of averages, 401 

Stalnaker, J. L., 361 

Standard deviation or 0, 50; caleula- 
tion of, 51-52; calculation of, by 
Short Method, 52-54; caleulation of, 
from raw scores, 54-56; in special 
cases, 56-57; when to use, 61; reli- 


standard scores, 


meaning of, 212; levels 
two- and one-tailed 
table for deter- 
tables of, 
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ability of, 194-195; estimation of 
true value of, 347-348; formulas for 
in partial correlation, 389-391 б 

Standard error, of a mean, in large 
samples, 182; in small samples, 190; 
of a median, 194; of c, 195; of 0, 
195; of a percentage, 196; of т, 197; 
of the difference between means, 
213-232; of the difference between 
medians, 232; of the difference be- 
tween r's, 239 

Standard error of an obtained score, 
342-313 

Standard error, of estimate, 161-163; 
in the interpretation of r, 174-175; 
in partial and multiple correlation. 
394-395 

Standard scores, 305-307; compared 
with T-scores, 312-313 ` 

Statistic, definition of, 181 

Stead, W. H., 161, 174, 345, 404 

Student's distribution, table of, 427 

Symonds, P. М., 145 


Tabulation: of measures in а fre- 
quency distribution, 4-9; in a cor- 
relation table, 128-130 

Taylor, H. C., 165 

Terman, L. M., 170, 343 

Test items: relative difficulty of, 
302-305; analysis of, 349-350 

Test-retest method, in reliability of 
test scores, 333 ` 

Test scores, factors affecting reliabil- 
ity of, 337-341 

Tetrachoric correlations, 362; calcu- 
lation of, 362-365; diagrams in, 365; 
SE of, 365-366; use of, in test eval- 
uation, 366-367 

Thomson, G. H., 400 

Thorndike, Е. L., 86, 115 

Thorndike, R. L., 171, 387, 398 

Thurstone, L. L., 59, 316, 365 

Transmutation of measures, 316-327; 
of judgments, 316-323; of orders of 
merit, 323-327 


Treloar, À. E., 219, 453 У f 
T-scale, 307-312; comparison with 
312-313; advan- 


standard scores, 


tages of, 313 ( 
t-test, meaning of, 190-192; compari- 


son with CR, 993; in analysis of 
variance, 275, 953-284, 285; table of 


t (Table D), 427 
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Validity: relation of, to reliability, 
344; measurement of, in a test, 344— 
349; in terms of criteria, 345-346; 
indirect measures of, 345-346; of 
test battery, 348-349 

Van Voorhis, W. R., 168, 365, 371 

Variability: meaning of, 42; measures 
of, 43; coefficient of relative vari- 
ability, 57-60; reliability of meas- 
ures of, 194-195. See also Mean de- 
viation, Quartile deviation, Range, 
Standard deviation 

Variance (o°): analysis of, 268-273; 
components of, 274-281 


Walker, H. M., 128, 154, 191, 453 
Wherry, R. J., 407 
Wherry-Doolittle Test Selection 


Method, 404; illustration of, 405- 
418; shrinkage formula for R in, 
407; regression equations in, 416- 
417; beta weights and multiple 2, 
417—418 

Wilks, S. 5., 230 

Woo, T. L., 263 

Woodworth, R. S., 317 


. 


Yates, Е., 203, 454 
Yule, G. U., 86, 90, 197, 371, 395, 400 


z-function (Fisher), use in determin- 
ing reliability of 7, 198-199; signifi- 
cance of difference between two 778, 
239-240 

Zubin, J., 280 
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